"Best practices in digital language archiving of language and music data" Sep. 6 7, University of Cologne. Abstracts



Similar documents
Sustainable Solutions for Endangered Languages Data: The Language Archive

DASISH. Workshop Trust and Certification

DASISH. WP4 Data Archiving

Archiving and the work flow of field work

Language Documentation and Description

A sustainable archiving software solution for The Language Archive

What does a Larry need?

RESEARCH DATA MANAGEMENT POLICY

Documentary linguistics workshop focusing on working with speaker-linguists and resource development

LEXUS: a web based lexicon tool

User Requirements for PID Service Providers: Survey Results from DASISH WP 5.2

SowiDataNet. Bringing Social and Economic Research Data Together

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Library and University Collections Vision, Values, Strategic Goals and Key Performance Areas,

UNIVERSITY OF NAMIBIA

Language Documentation and Description

Technology in language documentation

Research Data Management Policy. Glasgow School of Art

Digital Heritage Preservation - Economic Realities and Options

B SVF - Bavaria Long Term Preservation

Deliverable 12.1 Training Plan

The challenges of becoming a Trusted Digital Repository

Graduate Coursework in Liberal Arts. and Cross-Cultural Research

nestor - Network of Expertise in Long-Term Storage and Long-Term availability of Digital Resources in Germany

A federated data infrastructure: the Dutch way forward

How To Useuk Data Service

Growing a web archiving program: A case study for evolving an organization-management plan

Digital preservation a European perspective

Bradford Scholars Digital Preservation Policy

MA Psychology ( )

Digital Preservation The Planets Way: Annotated Reading List

Data Seal of Approval. Certification for sustainable and trusted data repositories

Digital Preservation Strategy,

SHared Access Research Ecosystem (SHARE)

How To Be A Successful Artist

Barwick, L. (2003, 31 July). Planning for PARADISEC. Paper presented at the Oz-eculture Conference, Brisbane.

The Language Archiving Technology solutions for sustainable data from digital fieldwork research

DRIVER Providing value-added services on top of Open Access institutional repositories

LIBER Case Study: Author: Mijke Jetten, University Library, Radboud University,

BA Psychology ( )

Introduction. What are online publications?

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

ERA Challenges. Draft Discussion Document for ACERA: 10/7/30

LAMUS & LAT Archiving software

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

The cross-disciplinary Roots of the British collaboration between scholars in humanities and

Digital Preservation: the need for an open source digital archival and preservation system for small to medium sized collections,

The Preservation and Sustainability of Research Data

IFI Irish Film Archive Digital Preservation & Access Strategy

Research Data Management Services. Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Archiving Systems. Uwe M. Borghoff Universität der Bundeswehr München Fakultät für Informatik Institut für Softwaretechnologie.

Integration of Records Management and Digital Archiving : What Can We Do Today? Ann Keen, Preservica Nov

The data landscape lessons from UK

ESRC Research Data Policy

Data Management in Science and the Legacy of the International Polar Year

Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP)

The Rise of Documentary Linguistics and a New Kind of Corpus

Enabling the re-use of research data: organising stakeholders and infrastructure in the Netherlands

Digital preservation policy

State of Michigan Records Management Services. Guide to E mail Storage Options

Graduate Institute of Applied Linguistics. Thesis Approval Sheet

Texas State University University Library Strategic Plan

Data Management Plan in Slovenia

POSITION DETAILS. Digitisation & Digital Services

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

Entering its Third Century

Project Plan DATA MANAGEMENT PLANNING FOR ESRC RESEARCH DATA-RICH INVESTMENTS

College of Communications and Information Studies

Digital Collecting Strategy

Access and accessibility at ELAR, a social networking archive for endangered languages documentation David Nathan

Towards research data cataloguing at Southampton using Microsoft SharePoint and EPrints: a progress report

Data management plan

Response from Oxford University Press, USA

Report of the DTL focus meeting on Life Science Data Repositories

Questionnaire on Digital Preservation in Local Authority Archive Services

Long-term preservation activities of the Bavarian State Library

Long-term preservation in Europe. The strategy of the Alliance for Permanent Access

Royal College of Music

Data Curation for the Long Tail of Science: The Case of Environmental Sciences

Global Networking of Collections WFCC and GBRCN perspectives. EMbaRC Seminar David Smith Cantacuzino Institute, Bucharest, Romania 8-9 March 2010

School of Communication and Information MLIS 17:610:558 Digital Library Technologies (online) Spring 2015 Course Syllabus

CLARIN-NL Third Call: Closed Call

Date submitted: 24 May 2012

The ISPS Data Archive: Mission, Work, and Some Reflections

Component MetaData Infrastructure

31 December Dear Sir:

Towards a common methodology for automation FOTs and pilots

Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning

The Key Elements of Digital Asset Management

Research Data Management Guide

DATA LIFE CYCLE & DATA MANAGEMENT PLANNING

ZBW's role in establishing services for RDM in economics

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Cloud Service Contracts: An Issue of Trust

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

BIG DATA Funding Opportunities

Methodological Issues for Interdisciplinary Research

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Transcription:

"Best practices in digital language archiving of language and music data" Sep. 6 7, University of Cologne Abstracts The role of the archive in mediating endangered language documentation Gary Holton, Alaska Native Language Archive, University of Alaska Fairbanks In traditional approaches to archiving the role of the archive has been limited to maintaining physical and intellectual control that is, the archive strives to know what they have and where it is located. Making sense of the material, knowing what is relevant and why, is left to the user. Increasingly though, archivists have become areal specialists, expected to know not just what items are held by the archive but also how those items relate to the larger intellectual effort in the field. This is certainly true of endangered language archives, which increasingly play an important role not just in preserving documentation but in contributing to further documentation and revitalization. To a certain extent this is not an entirely new phenomenon. The line between language documenter and language archivist has always been a fine one, and language archivists have played important roles in moving the field of language documentation forward. By helping to identify relevant extant documentation for under- documented languages, Freeman and Smith s (1966) guide to the Native American collections at the American Philosophical Society inspired much new research in Native American languages. Similarly, Krauss and McGary s (1980) bibliographic catalog Alaskan Indian languages offers a critical analysis of scholarly contributions to Alaska Native language documentation, thus providing a basis for future research. This type of mediated access to language archives provides a point of entry which allows better utilization of the archive. In theory, mediated access should be less necessary for digital archives, since electronic access and rich metadata should allow for ready discovery of relevant materials. In practice, the sheer volume of accessible materials, coupled with the often sub- standard metadata, complicates the resource discovery process. Users are flooded with too much data and may find it difficult to discern the more useful resources. To address this issue the Alaska Native Language Archive has initiated an effort to create featured collections highlighting the most valuable and useful resources for each Alaska Native language. In this presentation we describe the process of creating these mediated collections and report on initial reactions from user groups. It is hoped that this presentation will inspire further discussion about the role of mediation in endangered language archives.

Metadata for Endangered Languages and Global Biodiversity Mary S. Linn, Sam Noble Oklahoma Museum of Natural HistoryUniversity of Oklahoma The University of Oklahoma is home to a growing collection of North American indigenous languages, concentrating on the languages of Oklahoma and surrounding areas. The Native American Languages (NAL) collection is housed and curated in the Sam Noble Oklahoma Museum of Natural History. The metadata captured for language has more in common with the zoology departments (such as modern invertebrates and mammalogy), and even the paleontology department (such as paleobotany and vertebrate paleontology), than the other anthropology departments (archeology and ethnology). We have been in the process of updating the NAL database to capture metadata that fits the standards for language set by the Open Language Archive Community (OLAC) and to a lesser extent the ISLE Metadata Initiative (IMDI). In addition, the database is designed to adhere closely to standards for natural history collections set by Global Biodiversity Information Facility (GBIF) and Biodiversity Information Standards (TDWG). Thus, the NAL database looks very much like a database found in any biodiversity collection. This paper will examine the correspondences and differences in standards and controlled vocabulary between the two systems. It will show how we embedded OLAC standards into a GBIF framework. Finally, the paper will discuss the benefits of having language metadata as biodiversity metadata, including that of funders recognizing endangered language documentation and description as scientific investigation and language collections on par with other natural history collections. ELAR: experiences from a social networking archive David Nathan, ELAR, SOAS University of London ELAR (the Endangered Languages Archive) launched its current catalogue platform (see http://elar- archive.org) in June 2012 as a "social networking archive". Its design was driven by three factors: research into features judged important for an archive dedicated to endangered languages documentation; lessons learned from pioneer archives in the field; and the evolving broader technological trends and expectations from 2005. While ELAR's platform currently has limited social networking features, and has not been operational long enough to enable robust conclusions, we can already detect benefits of the approach - benefits for depositors, users and language community members - for example through ELAR's method of implementing access protocols. On the other hand, we have also faced difficulties, and some plans and predictions have not come to fruition. This talk will discuss these pros and cons and describe ELAR's future plans.

What's so special about music? Some methodological aspects of ethnomusicological field data in the Phonogrammarchiv of the Austrian Academy of Sciences in Vienna Jürgen Schöpf, Phonogrammarchiv, Austrian Academy of Sciences, Wien In 1899, the then Imperial Academy of Sciences in Vienna founded what was to be become the world's first sound archive. Today, the Phonogrammarchiv of the Austrian Academy of Sciences mirrors the Austrian research scene in many disciplines, from bioacoustics and anthropology to ethnomusicology, linguistics, and religious studies, to name just a few. It holds, since the year 2000, also video recordings. Interdisciplinary from the outset, music and language have always ranged at eye level in the Phonogrammarchiv. Most field working linguists in recent decades have recorded music. Be it as part of their linguistic work that includes vocal music, or in an approach of documenting culture as a reference along side with language, or because of their personal interest, or the interests of their informants. Therefore it appears not unduly to speak about music recording and archiving in linguistic circles - much less even in times when researchers are encouraged to use archival ressources across disciplines as today. In my presentation I argue that the demands of music in both field work and archiving exceed the requirements of language in important aspects. The arguments discussed will comprise technological ones (e.g. size of a sound source, duration of a performance, dynamic range), archival ones (e.g. multi track), and legal and ethical ones (commodification of music, ethical demands). It is claimed, since musicological demands in those aspects appear to be higher, that ethnomusicological approaches may lead the methodological discussions in the choir of disciplines. Networking digital ethnographic archives Nick Thieberger, University of Melbourne / PARADISEC What can a network of digital endangered language archives offer that each archive on its own cannot? There are no doubt a number of areas that could be dealt with at this higher level, including: agreement on what (technical and metadata) standards should be used; perhaps providing accreditation of archives (similar to the five- star system used by the Open Language Archives Community); providing mirrored backup of each other s collections; jointly developing software (e.g., for cataloging, ingestion, metadata creation); and so on. In this presentation I want to focus on methods for locating endangered collections, digitising and accessioning them, and incorporating their metadata into federated search tools. Endangered collections may be a collection of records in a deceased estate with no further information, or perhaps are a set of described recordings made by a Native Patrol Officer and now held by their children. It is only by a concerted effort that we can locate these collections and obtain the trust of their owners to accession them into an archive. Of course, our

responsibility as linguistic fieldworkers should ensure that we create proper collections and ensure they are archived, but my experience over ten years of working with the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) is that it is still not the case that linguists take the creation of a research collection seriously. A federation of archives subscribing to a central body could apply collectively for philanthropic funds to support the location and accession of otherwise inaccessible primary materials. Much recorded material is outside of academia, and outside of state or national repositories. The federation could actively seek this material. For material that is in established collections the federation could provide an online referral service, matching language codes to the URL of the collection, thus bringing it in to the search mechanisms of the language archives community. The Language Archive in the context of emerging research infrastructures Paul Trilsbeek, The Language Archive, Max Planck Institute for Psycholinguistics, Nijmegen Archiving and publication of research data is a hot topic in a growing number of research disciplines. It is also seen as very important by decision makers in the research funding landscape which has resulted in significant sums of money being invested in projects that aim at developing the necessary technical infrastructure for archiving and utilizing research data. Interoperability between repositories and between research tools is a key aspect of these projects. The Language Archive at the Max Planck Institute for Psycholinguistics has been involved in a number of these research infrastrucure projects during the past 5 years and has significantly contributed to the conceptualization and development of archiving and research infrastructures for language data and data in the humanities in general. The European CLARIN project and the national CLARIN Germany and CLARIN Netherlands projects are examples of these as well as the more recent DASISH and EUDAT projects. In this talk I will give an overview of these developments and their relevance for archives of endangered languages. Speech Resources and Tools at BAS - the World seen by Speech Database Providers Christoph Draxler, Bavarian Archive for Speech Signals, München The Bavarian Archive for Speech Signals has been creating, distributing and maintaining speech and multimodal databases since 1995. These databases were mainly collected for speech and video processing technology - speech recognition, speech synthesis, speaker identification, etc. A number of tools have been developed to facilitate or even automate some of the processing steps in creating, annotating and distributing these databases - some of these tools have become de facto standards in the speech processing community. Increasingly, these databases are being used by phoneticians and linguists for a number of

reasons: the databases are well documented, in most cases there are no or low license fees, the databases are large both in terms of speakers and speech phenomena, the recording quality is very high, and these databases have been validated for technical quality against their specifications. As a CLARIN- D centre, the BAS is currently making a subset of its speech and multimodal databases available online to the research community. Additionally, web- based speech processing services are now being implemented that allow students and researchers to perform complex speech processing tasks without the need to install software on their own computer. In this talk I will present an overview of the BAS tools and technology and discuss how they can be used to create speech databases of endangered languages. Audit and Certification of Digital Repositories Natascha Schumann, GESIS - Leibniz- Institut für Sozialwissenschaften, Datenarchiv für Sozialwissenschaften, Köln GESIS Data Archive for the Social Science: Overview and long- term preservation activities Audit and Certification EU Framework of audit and certification of digital repositories: 1 Support of efforts regarding to trusted digital repositories 2 Harmonisation of existing initiative 3 Memorandum of Understanding with three levels of certification brief overview of the three levels/criteria Why and how we use the OAIS model for cost-effective long-term and medium-term preservation of digital resources Bernard Bel, Laboratoire Parole et Langage, CNRS - Université d'aix- Marseille Audit and Certification of Digital Repositories GESIS Leibniz- Institut für Sozialwissenschaften, Datenarchiv für Sozialwissenschaften, Köln