Technology in language documentation



Similar documents
The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

LEXUS: a web based lexicon tool

Sustainable Solutions for Endangered Languages Data: The Language Archive

LAMUS & LAT Archiving software

A sustainable archiving software solution for The Language Archive

The Rise of Documentary Linguistics and a New Kind of Corpus

Overview The Corpus Nederlandse Gebarentaal (NGT; Sign Language of the Netherlands)

The Language Archiving Technology solutions for sustainable data from digital fieldwork research

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal 01/03/2007 DAM-LR

Extracting and Preparing Metadata to Make Video Files Searchable

Video Upload and Management

Audacity Essentials. San José State University Help Desk. You can use and get Audacity for free

Documentary linguistics workshop focusing on working with speaker-linguists and resource development

InqScribe. From Inquirium, LLC, Chicago. Reviewed by Murray Garde, Australian National University

Checklist and guidance for a Data Management Plan

ANNEX - Annotation Explorer

Data Management Plans - How to Treat Digital Sources

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

Best Practices for Digital Preservation in Smaller Institutions: The SCOLA Model

Functional Requirements for Digital Asset Management Project version /30/2006

User Guide for ELAN Linguistic Annotator

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

Google Holy Family University

Quality Assurance Checklists for Evaluating Learning Objects and Online Courses

How to use and archive data at the Archive of the Indigenous Languages in Latin America (AILLA): A workshop-presentation

Towards Web Services for Speech Recording and Annotation

The green information chain: Groen Kennisnet brings agricultural knowledge from research to the classroom, on the farm, and into agri-business

Language Documentation and Description

Transcription. Founder Interview - Panayotis Vryonis Talks About BigStash Cloud Storage. Media Duration: 28:45

Intera Intera Deliverable D2.3

Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project

To: MesoSpace team Subject: ELAN - a test drive version 3 From: Jürgen (v1), with additions by Ashlee Shinn (v2-v3) Date: 9/19/2009

Privacy Policy/Your California Privacy Rights Last Updated: May 28, 2015 Introduction

SPRING SCHOOL. Empirical methods in Usage-Based Linguistics

EMC DOCUMENTUM CONTENT ENABLED EMR Enhance the value of your EMR investment by accessing the complete patient record.

BCA LG Using (Free) Web- Based Applications: File Storage, Sharing and Presentation

Your Archiving Service

STANDARDS IN SPOKEN CORPORA

Request for Proposal Digital Asset Management October 24, 2014

Mediasite. podcasting user guide

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫

Enhancing Web Publishing with Digital Asset Management - Using Open Text Artesia DAM to enhance your Open Text WCMS (Red Dot) web sites

How To Write A Blog Post On A Web Page

1. Digital Asset Management User Guide Digital Asset Management Concepts Working with digital assets Importing assets in

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Annotation in Language Documentation

System requirements. Installing FORAMENRehab. Before you install: Microsoft Windows XP or later operating system

Welcome to ncrypted Cloud!

Report of the DTL focus meeting on Life Science Data Repositories

"Best practices in digital language archiving of language and music data" Sep. 6 7, University of Cologne. Abstracts

STEPS IN LANGUAGE DOCUMENTATION AND REVITALIZATION JACK MARTIN NICK THIEBERGER

Computers Are Your Future Eleventh Edition Chapter 5: Application Software: Tools for Productivity

MPEG-7 Requirements and Application for a Multi- Modal Meeting Information System: Insights and Prospects within the MISTRAL Research Project

1. Digital Asset Management User Guide Digital Asset Management Concepts Working with digital assets Importing assets in

GUIDELINES FOR THE CREATION OF DIGITAL COLLECTIONS

Digital Asset Management

Mac OS X User Manual Version 2.0

The Register Menu allows you to register, download, and activate licenses so that your players can run.

Digital Asset Management (DAM) Protecting, preserving, retrieving and distributing digital assets

A Platform for Managing Term Dictionaries for Utilizing Distributed Interview Archives

Digital Forensics with Open Source Tools

WHITEPAPER MANAGE YOUR PAPER MATERIAL WITHIN YOUR CRM SYSTEM

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

The United States Office Of Personnel Management eopf Human Resources Specialist Training Manual for eopf Version 4.0.

Tools & Resources for Visualising Conversational-Speech Interaction

FLIP PDF FOR MAC. Create your flipping book from PDF files on Mac

VThis App Note USING FLIPFACTORY PLAYBACK SERVICE FOR AVID INTERPLAY TRANSFER ENGINE. App Not e

Beloit College ListServs

Zmanda Cloud Backup Frequently Asked Questions

ISLANDORA STAFF USER GUIDE. Version 1.3

Computerized Language Analysis (CLAN) from The CHILDES Project

Metalogix Storage Expert Installation Guide

Setting Sharing Permissions for Google Docs and Google Sites

cbox YOUR FILES GO MOBILE! FOR ANDROID SMARTPHONES AND TABLETS USER MANUAL

RecordPoint Overview

Exploiting Sign Language Corpora in Deaf Studies

Using Dropbox with Amicus Attorney. (Presentation Notes) Full Presentation & Video using-amicus-attorney-with-dropbox

Seavus Group All rights reserved

Jogat - Business Proposition

Frequently Asked Questions

HTML5 for ETDs. Virginia Polytechnic Institute and State University CS May 8 th, Sung Hee Park. Dr. Edward Fox.

Web Archiving and Scholarly Use of Web Archives

WatchDox for Windows. User Guide. Version 3.9.5

HOW MAM IMPROVES POST PRODUCTION MEDIA SHARING. EVOLVED.

BASH BOSH'S SEO FOR BLOGGER

Image Galleries: How to Post and Display Images in Digital Commons

Open Access to Manuscripts, Open Science, and Big Data

Introduction to Open Atrium s workflow

Kaltura Extension for SharePoint User Manual. Version: Eagle

Plagiarism. Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support for South Asia Head, LRC & CDDL, IIM Kozhikode

8000hz Mono (single) Sound 16-bit

Using Impatica for Power Point

Towards itunes U en Youtube.edu. Sonntag, 5. Dezember 2010

Kaltura Video Plugin for Jive Deployment Guide. Version: 1.0

Digital Asset Management RFP Issued: October 24, 2014

MTS Remote Drive Service. Quick Start Guide

D4.1 - Functional Specifications & Portal Architecture

DELIVERABLE. Grant Agreement number: Europeana Cloud: Unlocking Europe s Research via The Cloud

Collecting Polish German Parallel Corpora in the Internet

Transcription:

Technology in language documentation Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Documenting oral traditions in the non-western world

Language (archiving) technology Language documentation: Aim: Maintain, consolidate or revitalize endangered languages Through: Creation of a representative, multipurpose and long-lasting record of languages By: Recording language events, speech and gesture in natural context etc. Storing the resources in an organized, accessible and persistent archive

Language (archiving) technology In this presentation: Why archiving? What is an organized, accessible and persistent archive? Which technology is required (and offered by the MPI) Metadata tool Archive upload and access management tools Browsing, searching and accessing the resources Enrichments with ELAN, LEXUS and ViCoS LAT Language archiving technology www.mpi-lat.eu

Why archiving? Misconceptions about archiving 1. Your stuff is buried here and gone forever (from Andrew Garret Berkeley Archives)

Why archiving? Misconceptions about archiving 1.Your stuff is buried here and gone forever 2.Other linguists will take advantage of your hard work and take away your good ideas (from Andrew Garret Berkeley Archives)

Why archiving? But the actual truth is that: 1.Other linguists do not really care about your work 2. The people who do care are the members of the speech communities and they care about it in a different way than you do (attributed to Rufus Pollack)

Why archiving? Is there a danger that we loose digital data? YES, UNESCO: 80% of our recordings is endangered How much of your data and files on the notebook is organized, backed-up? How long can media and formats be accessed?

Why archiving? Is there a danger that we loose the data? YES, a few messages Archive data into a trusted archive (long term preservation and accessibility) Create high quality metadata so that you can find the way back to the data Use open standards MPI-PL archive: Trusted archive, ORGANIZED information Continuous extension Collaboration and interaction Commentary and relation drawing (enrichment) Supporting centre for cross-corpus and language work

Why archiving? Correct conceptions about archiving 1. It requires discipline 2.It creates a bit of techno noise

What s an organized, accessible and persistent archive Task of the archiving instance Organization of corpora or data following clear principles Creation of a coherent and consistent archive Store data in an accessible and persistent form (long-term) Give access to data to different users, but protect data against unauthorized access Adhere to code of conduct and adhere to ethical and legal issues Provide tools to researchers

Language archiving technology Clear principles: Data organization and access infrastructure IMDI Metadata editor Browser and search Coherent, consistent and persistent: Data management LAMUS Checks the content of the files, and file type check Assigns a persistent identifier to the uploaded file Allows the creation of corpus structures Web based, easy to use Safe access: Data access rights and protection AMS All metadata in the archive is open All resource access can be controlled by AMS (web based) Users remain the owners and stay in control of the access Setting of licenses and code of conducts LAT Language archiving technology www.mpi-lat.eu

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Enrichments Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Tools required Describe resources with metadata IMDI metadata Metadata is data about data Structured and machine readable Elements describe the content of the resource files IMDI set: General data: project, location Content data: Genre, Interactivity, Modality, Language Actor data: Age, gender, languages Resource data: format, size IMDI editor downloadable from the LAT page LAT Language archiving technology www.mpi-lat.eu

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Tools required Upload resources and metadata files with LAMUS LAMUS Checks the content of the files, and file type check Assigns a persistent identifier to the uploaded file Allows the creation of corpus structures Web based, easy to use LAT Language archiving technology www.mpi-lat.eu

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Tools required Setting access rights: AMS Access on resources: IPR, privacy, copyright etc. (Metadata is always open!) Default setting after upload

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Tools required Browsing the data: IMDI browser

Tools required Viewing the data: ANNEX viewer

Tools required Searching the metadata and content data: Search engines: IMDI and TROVA

Tools required Different ways of accessing: Community portals Google Earth Layers

Tools required Workflow and tool requirements: Recording Capturing (DV) Transcoding (MPEG1 & 2, WAV) Enrichments Describe resources with metadata Browsing, searching, viewing downloading Setting access rights to the resources Uploading metadata & resources

Enrichments ELAN:annotating video and audio resources ELAN for off-line work on annotations

Enrichments LEXUS & ViCoS: Web based lexicon tool, multimedia encyclopedia

Archiving instance Archiving instance, Max Planck Institute for Psycholinguistics Archive managers: 3 Archive developers: 2 System manager: 1 Archiving software development: 4 Enrichment software development: 4 Archive for language data: 40 Terabyte of data 400.000 archived objects

Technology training Training sessions: We regularly organize training sessions on: Audio and video handling Archiving technology Enrichment of data We do welcome participants from other than DoBeS or MPI projects Its 4-5 days in one week Interested: please check our MPI website/events section (www.mpi.nl/events/) Contact: jacquelijn.ringersma@mpi.nl paul.trilsbeek@mpi.nl