Using ELAN for transcription and annotation



Similar documents
User Guide for ELAN Linguistic Annotator

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

InqScribe. From Inquirium, LLC, Chicago. Reviewed by Murray Garde, Australian National University

Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project

Hosted Call Recorder Guide. Rev A (21/11/14)

How to Upload and Caption Videos on YouTube

ANNEX - Annotation Explorer

Microsoft PowerPoint 2010

INTRODUCTION TO TRANSANA 2.2 FOR COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE (CAQDAS)

Microsoft Access 2010 Overview of Basics

BASIC VIDEO EDITING: IMOVIE

Excel -- Creating Charts

USING WINDOWS MOVIE MAKER TO CREATE THE MOMENT BEHIND THE PHOTO STORY PART 1

Manual Interaktive Sprachreise

Longman English Interactive

Annotation in Language Documentation

Using WebEx. Support. Accessing WebEx. WebEx Main Screen

WebViewer User Guide. version PDFTron Systems, Inc. 1 of 13

Using Microsoft Office XP Advanced Word Handout INFORMATION TECHNOLOGY SERVICES California State University, Los Angeles Version 1.

PREPARING TRANSCRIPTS

Introduction to Microsoft Access 2003

This workshop introduces the basics of using the EndNote software with Windows.

The main imovie window is divided into six major parts.

Managing documents, files and folders

GOOGLE DOCS APPLICATION WORK WITH GOOGLE DOCUMENTS

Microsoft PowerPoint 2011

RAMP for SharePoint Online Guide

Boardmaker and Speaking Dynamically Pro

How to Edit Video with Any Video Converter Ultimate

Training Modules: Entering a New Transcript Part 2

Smart Sync. Computer Classroom Management Software. What is SMART Sync? Table of Contents

Outlook . Moving and deleting . Moving s

Table of Contents Getting Started... Recording...11 Playing Back...14

MS Excel Template Building and Mapping for Neat 5

Avaya Flare Experience for Windows Quick Reference

Microsoft PowerPoint Exercises 4

for Sage 100 ERP Business Insights Overview Document

Graphing in excel on the Mac

Installation and setup of remote keyboard for monitoring system Multiscreen 5.x

ImagineWorldClient Client Management Software. User s Manual. (Revision-2)

Microsoft Outlook 2013 Workshop

TTP User Guide. MLLP Research Group. Wednesday 2 nd September, 2015

Welcome to The Grid 2

ICP Data Validation and Aggregation Module Training document. HHC Data Validation and Aggregation Module Training Document

Computer Basics: Tackling the mouse, keyboard, and using Windows

Archiving. Follow these steps to archive your

Using WebEx Player. Playing a recording. Installing WebEx Player. System requirements for listening to audio in a recording

Help File. Version February, MetaDigger for PC

Transcription Format

Tools & Resources for Visualising Conversational-Speech Interaction

TouchCopy is designed to help you get the most out of your ipod, ipod Touch, iphone or ipad.

Manual Version CV

Module 2 Using the Database

So you want to create an a Friend action

MICROSOFT OUTLOOK 2010 WORK WITH CONTACTS

User Manual f4. A compact guide on how to use f4

ArcGIS. Tips and Shortcuts. for Desktop

This feature allows you to convert a collection of PDFs into EndNote references using the Digital Object Identifier (DOI).

Sync IT. Detailed description of program. Tab: Sync

General Product Questions Q. What is the Bell Personal Vault Vault?...4. Q. What is Bell Personal Vault Backup Manager?...4

Guitarists or other musicians transcribing a solo into tablature or music notation.

Microsoft PowerPoint 2008

Microsoft Migrating to Access 2010 from Access 2003

CDOT Monumentation Sheets

Using Microsoft Word. Working With Objects

CLOUD DIGITISER 2014!

User Guide for TeamDirection Dashboard

From Data Modeling to Data Dictionary Written Date : January 20, 2014

Hands-On: Introduction to Object-Oriented Programming in LabVIEW

User Manual. f5transkript

Adobe Audition Workshop Instructor: Sam Fuqua

MICROSOFT WINDOWS NAVIGATION

Lab 2: MS ACCESS Tables

Using Big Blue Button for Video Conferencing in Blackboard

ELOQUA INSIGHT Reporter User Guide

Removing Primary Documents From A Project. Data Transcription. Adding And Associating Multimedia Files And Transcripts

Module A2 Item Activities, Gantt Chart and Utilization Sheet

ATLAS.ti 5.2: A Qualitative Data Analysis Tool

CRM Basic Guide. First, you must login to On the Menu at the left, find the Support menu item.

Closed Captioning Resources & Best Practices

TUTORIAL: BOARDMAKER STUDIO START-UP

Power Monitoring Expert 7.2

Pro/ENGINEER Wildfire 4.0 Basic Design

Working With Microsoft PowerPoint

Using Multimedia with Microsoft PowerPoint 2003: A guide to inserting Video into your presentations

What is File Management. Methods for Categorizing Data. Viewing Data on a Computer

Transcription:

Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video and audio data. links text annotations with audio and/or video data. one audio stream, up to four video streams ELAN files can be exported in a variety of formats (including to Shoebox/Toolbox for interlinearisation, then reimported) 1

What can t ELAN do? It can t do your transcription It can t do your analysis It can t keep you organised It can t (by itself) make a viewer for community members It isn t (unfortunately) very easy to learn What can ELAN do? It can help with transcription and translation It can help with your analysis by presenting your data It can help keep you organised by linking the media and data files together It can help you find things in your data It can help if making a product for community members (text, subtitled video) 2

Some terminology Tiers Annotations Linguistic types Stereotypes Tiers Tiers are where you put your annotations Tiers can contain many kinds of annotations, some of the most obvious are: IPA transcription practical orthographic transcription free translations into languages of wider communication morphemes and gloss gesture annotation grammar notes any other information which seems relevant 3

Tiers Tiers can be independent or linked to/depend from other tiers. There is no (theoretical) limit to the number of tiers. Tiers can be hidden or rearranged for ease of use Each speaker has their own set of tiers, so overlapping speech is not a problem. Tiers 4

Linguistic Types Every annotation tier must be assigned a linguistic type which tells Elan what type of information the tier contains. (It would be better if these were called tier types ) You can set up and name the Linguistic Types for example reference number, utterance, word, PoS, translation, and so forth. They can have Controlled Vocabularies (eg. select a PoS from a drop-down menu) The Linguistic Types are defined by Stereotypes which basically specify the way that tiers relate to the time line, and to each other The Stereotypes (it would be better to call them Tier Categories or something) are basically that a tier will be independent, or hierarchically associated with other tiers Stereotypes : None: The annotation on the tier is linked directly to the time axis (eg. intonation units/sentences - a transcription or a reference number). Time Subdivision: The annotation on the parent tier can be subdivided into smaller units, which, in turn, can be linked to time intervals (eg. words). There cannot be gaps between units. Symbolic subdivision: Similar to Time Subdivision, except that the smaller units cannot be linked to a time interval (eg. morphemes within words). Included In: like Time Subdivision but there can be gaps (eg. words, with silence between them). Symbolic association: one-to-one association with a parent tier, eg. transcription with ref field, gloss and morpheme, free translation with sentence. 5

Tier dependencies: parents and children Document X Stereotypes: Text/utterances (speaker A) (none) Words (Time subdivision) Morphemes (symbolic subdivision) Parts of speech (symbolic association) glosses (symbolic association) Free translations (symbolic association) Text/utterances (speaker B) Words Morphemes Parts of speech glosses Free translations 6

Too much bother? Maybe. But you only need to do it once, then you can import types and tiers from other ELAN projects, or set up templates. Change or add to them as you need. Once you have annotated with ELAN, you have well-structured XML which can be used for other purposes - ie. exported to Shoebox, DVD subtitles etc. Working with Shoebox This is not entirely straightforward, but is not too difficult if you are already quite familiar with the workings of Shoebox and the structure of its files. If you know you want to export to Shoebox, it s better to start from the beginning with a ref type and tier (stereotype: None) which will only contain time information now (ie. it will be empty), but later will contain a Shoebox ref number. The transcription tier will be a symbolic association depending from the ref tier The Shoebox export process puts the time and speaker information in separate fields. After working in Shoebox, ELAN can import the file, and the time and speaker information will be preserved. 7

Exercise 1: Exploring ELAN Open the elan_training folder, then the elan_example folder. Open the file named elan_example3.eaf. Play and then stop: -Play/Pause button: -CTRL+SPACE Make a selection, and play the selection: -Click -SHIFT+SPACE -with and without the LOOP box ticked. -deselect: View a specific window: Grid, Text, Subtitles, and choose different tiers to view. Use the Find window to search for text. Exercise 2: Creating/Saving a new ELAN document Launch ELAN File > New > (navigate to elan_training/ elan_exercise and select the media file: elan_exercise.wav > OK The waveform should open File > Save/Save As > (type the name elan_exercise). This will create an.eaf file. Make sure you are saving it in the right place! File > Automatic backup > (click the time interval) 8

Exercise 3: Creating linguistic types Create linguistic types Type > Add New linguistic type Type Name: t, Stereotype: none > Add (this will be for sentences) Type Name: w, Stereotype: Time Subdivision > Add (this will be for words) Close. Exercise 4: Creating tiers Create a tier for transcription Tier > Add New Tier > type the name t@a (this will be for Speaker A) (make sure Linguistic Type is t) > Add Create a tier for words (while Add Tier box is still open) type the name w@a (make sure Parent Tier is t@a and Linguistic Type is w) > Add. Close. 9

Exercise 5: Entering/editing annotations Listen to the sound file (and look at the waveform) and think about breaking the speech up into units (sentences, intonation units). There are two ways to do this: 1. straight into the tier Select a time span containing a unit: click, hold and drag with the mouse > play the selection (possibly toggle the Loop Mode). Right click > New Annotation here (for the active tier) > transcribe the unit > Ctrl +Enter (to save) or Esc (not to save). CTRL-ENTER IS VERY IMPORTANT. DON T FORGET, EVERY TIME YOU ENTER ANY ANNOTATION, CTRL-ENTER! 2. Segmentation mode. Edit > Segmentation. A new window appears. Play the sound while pressing Enter to segment (similar to Transcriber). Click Apply to save the segmentation. Changing boundaries: click an annotation to select it > draw the new boundary > Ctrl+Enter (to save) select an annotation > press & hold Alt and drag the boundary of the active annotation to a new place. Exercise 6: Using Elan to break sentences up into words When there are a few annotations on tier t@a, Elan can fill the word tier w@a from it. Edit > Tokenise Tier Source Tier > t@a Destination Tier > w@a Start 10

Exercise 7: add a free translation tier first add a new type f Type > Add New linguistic type Type Name: f, Stereotype: Symbolic Association > Add now add tier f Tier > Add New Tier > type the name f@a (this will be for Speaker A) (make sure Linguistic Type is f) Add The tier should be there. Double click in it to enter free translations corresponding to the segments in t. Don t forget Ctrl-Enter! Save it, and have some coffee, or lunch. 11