Annotation of the video data in the Corpus NGT

Similar documents
Sign language transcription conventions for the ECHO Project

Overview The Corpus Nederlandse Gebarentaal (NGT; Sign Language of the Netherlands)

User Guide for ELAN Linguistic Annotator

Using ELAN for transcription and annotation

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

Teaching Methodology for 3D Animation

A Survey of ASL Tenses

Start ASL The Fun Way to Learn American Sign Language for free!

Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum.

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

Row Echelon Form and Reduced Row Echelon Form

SYNTHETIC SIGNING FOR THE DEAF: esign

SMART Board Menu. Full Reference Guide

Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project

CUFANM501A Create 3D digital character animation

Teacher Resource Bank Unit 2 Exemplar Assignments

Managing Media with Final Cut Pro X Libraries. White Paper June 2014

The Blu-ray Disc. Teacher's manual. Jean Schleipen, Philips Research, Eindhoven, The Netherlands

CHECKLIST FOR THE DEGREE PROJECT REPORT

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

a basic guide to video conversion using SUPER

ANNEX - Annotation Explorer

ELL Considerations for Common Core-Aligned Tasks in English Language Arts

Elasticity. I. What is Elasticity?

CREATE A 3D MOVIE IN DIRECTOR

2010 School-assessed Task Report. Media

Churnet View Middle School Displays

WP5 - GUIDELINES for VIDEO shooting

Video, film, and animation are all moving images that are recorded onto videotape,

MT. DIABLO UNIFIED SCHOOL DISTRICT COURSE OF STUDY

Visual Storytelling, Shot Styles and Composition

HamNoSys Hamburg Notation System for Sign Languages

Ohio Early Learning and Development Standards Domain: Language and Literacy Development

Course Development Resource Guide. Professional Development & Community Engagement Educational Technology Support

Running head: FINGERSPELLING IN ASL DISCOURSE 1

Color quality guide. Quality menu. Color quality guide. Page 1 of 6

Reporting of Interpreting as a Related Service. on the PEIMS 163 Student Data Record

Cartooning and Animation MS. Middle School

A text document of what was said and any other relevant sound in the media (often posted with audio recordings).

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Capturing Material. Section 3

To: MesoSpace team Subject: ELAN - a test drive version 3 From: Jürgen (v1), with additions by Ashlee Shinn (v2-v3) Date: 9/19/2009

Things to remember when transcribing speech

Comparisons and Contrasts Between imovie 2 and Movie Maker 2

STEPS IN LANGUAGE DOCUMENTATION AND REVITALIZATION JACK MARTIN NICK THIEBERGER

Select the Crow s Foot entity relationship diagram (ERD) option. Create the entities and define their components.


Figure 3.5: Exporting SWF Files

Solving Systems of Linear Equations

Hugo. Suitable for: primary literacy; history (of cinema); art and design; modern foreign languages (French)

Basic Computer Skills Module 3. Introduction to Microsoft Word 2010

Visual Rhetoric/Visual Literacy: Writing About Film

Little Pocket Sorts : Irregular Past-Tense Verbs

Information Technology Lab School of Information University of Texas Summer 2005

Information Technology Career Field Pathways and Course Structure

Arts, Audio-Video Technology & Communications Career Cluster Audio & Video Technology & Film I Course Number

Computer Science Concepts in Scratch

ELEMENTS AND PRINCIPLES OF DESIGN

How Can Teachers Teach Listening?

Preview DESIGNING DATABASES WITH VISIO PROFESSIONAL: A TUTORIAL

Virtual Classroom Student Guide

Bitrix Site Manager 4.1. User Guide

Leaf River Outdoor Products Vibrashine, Inc. P.O. Box 557 Taylorsville, MS (601) FAX (601)

Grade 1 LA Subject Grade Strand Standard Benchmark. Florida K-12 Reading and Language Arts Standards 27

If there are any questions, students are encouraged to or call the instructor for further clarification.

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Compositing a 3D character over video footage in Maya Jean-Marc Gauthier, Spring 2008

Lesson Plan. Course Title: Digital and Interactive Media Session Title: College and Career Poster

Lesson 3: Behind the Scenes with Production

INAHTA Pre-conference Workshop at HTAi 2006

2D Animation with Flash Program: Year 9 Digital Media

Using Text & Graphics with Softron s OnTheAir CG and OnTheAir Video

Name Class Date Laboratory Investigation 4B Chapter 4: Cell Structure

3DVista Virtual Tour Suite

Technical Questions and Answers for Beneficiaries on the Licence Management System of the Erasmus+ Online Linguistic Support

A Guide to Using the Core Orientation Program

Read Item 1, entitled New York, When to Go and Getting There, on page 2 of the insert. You are being asked to distinguish between fact and opinion.

Animation Overview of the Industry Arts, AV, Technology, and Communication. Lesson Plan

Camtasia Studio. Creating Screen Videos

GENESEE CHRISTIAN SCHOOL Online Course Catalog. Genesee Christian Middle & High School

Goals of the Unit. spm adolfo villafiorita - introduction to software project management

Chapter 2: Getting Started

Clarified Communications

SignStream Annotation:

Modeling Guidelines Manual

CINEMA DEPARTMENT COURSE LEVEL STUDENT LEARNING OUTCOMES BY COURSE

The School-assessed Task has three components. They relate to: Unit 3 Outcome 2 Unit 3 Outcome 3 Unit 4 Outcome 1.

3D Viewer. user's manual _2

University of Nicosia, Cyprus. Course Code Course Title Credits (ECTS)

TOOLS for DEVELOPING Communication PLANS

Animal Colors and Shapes Teacher s Guide

When you and your students are saving your files, under the File menu of MovieMaker, save your file as a project rather than a movie:

GCSE Film Studies Guidance & Frequently Asked Questions

Categories Criteria Instructional and Audience Analysis. Prerequisites are clearly listed within the syllabus.

Game Development. What is a game?

Scanning for OCR Text Conversion

Evaluating the Finnish Sign Language Benchmark

READING-MAZE (R-MAZE)

American Sign Language and ASL Linguistics

Transcription:

1/9 Annotation of the video data in the Corpus NGT Onno Crasborn & Inge Zwitserlood, November 2008 Department of Linguistics & Centre for Language Studies Radboud University Nijmegen corpusngt@let.ru.nl http://www.corpusngt.nl 1. Introduction All of the video files in the Corpus NGT have been provided with ELAN annotation files in the first release of the corpus in December 2008. Only a small number of these files (160 out of 2375) actually contain annotations. The other ELAN files are empty, yet do contain the same tiers and linguistic type definitions as the annotated documents. Thus, they will facilitate use of the corpus in a later phase when annotations are added. The conventions used in creating the files as well as the glossing conventions are documented here. In addition, we describe some Perl scripts that have been created to complement the functionality in version 3.6 of ELAN. All of the annotation files (as well as the video and audio files) are subject to the Creative Commons License BY-NC-SA. For more information, see the corpus website. 2. Specifications for Linguistic Types and Tiers The EAF files in the Corpus NGT contain tiers that are relevant for a large user group. Only the gloss tiers contain annotations, and only for a restricted set of movies. The specifications of the files are available as an ELAN template (CorpusNGT.etf), which can be used for creating comparable EAF files for new media files. There are several scripts that can also be used for generating large numbers of annotation files; this script is described in the section Scripts below. An advantage of using the scripts is that it is possible to specify the labels for annotator and participant for the various tiers in a large number of files, by specifying this in a source file. The scripts were tailor-made for the Corpus NGT and may need to be adapted for your own purpose. The following specifications for Linguistic Types are the basis for the tiers in the files and template: gloss

2/9 translation remarks research These Types have the same specifications; the difference in the names facilitates searching in a specific group of tiers (e.g. only gloss tiers). The specifications for all of these Linguistic Types are as follows: Stereotype: none Use controlled vocabulary: none ISO data category: not used Reference to graphics allowed: no Tiers with these Linguistic Types have been created for every video file in the Corpus NGT. The labels S1 en S2 have been systematically used for the signer on the left and on the right, respectively. Although the user can change the order of the signers in the ELAN file or choose to show only one signer, in the Corpus NGT special care was taken to set the order in such a way that the upper body view of the two signers in such a way that they appear to be turned towards each other. In reality the two signers were always opposite of each other, but the camera was always oriented at a small angle from the signer. This order is achieved by ordering the linked media files in the EAF file. In addition, the tier characteristics contain reference to the code of the signer in the Corpus NGT (e.g. S031). The tier characteristics contain the following categories: Participant: the corpus code for a particular signer (S001, S002, etc.); Annotator: the initials of the annotator; this category is as yet empty for other tiers than the gloss tiers; Tier label colour: two values have been used, in order to visually distinguish the tiers for the signer to the left and the signer to the right in the timeline viewer: Left: RGB 0,0,51 (dark blue) Right: RGB 0,153,0 (bright green) Most properties of each annotation file are stored in the EAF file; some visual properties such as the tier order in the timeline viewer and the colour of tier labels in the timeline viewer are stored in the preferences file (extension.pfsx). Every EAF file contains the following tiers: GlosL S1 GlosR S1 GlosL S2 GlosR S2 These 4 tiers contain the glosses for the activities of the left hand (GlosL) and the right hand (GlosR) respectively, of the signer to the left (S1) and the signer to the right (S2). The conventions used in these glosses are below in the section Gloss conventions. Tolk S1

3/9 Tolk S2 Interpreter S1 Interpreter S2 These four tiers should contain a transcript of an interpreter (Dutch) voice-over in Dutch (Tolk) and in English (Interpreter). Currently, only the Dutch voice-over is available, for a limited number of movie files. Note: in case a translation is made of the video materials, preferably new tiers Vertaling and Translation should be used, to highlight the fact that the voiceover by the interpreter(s) was done simultaneous with the signing during the movie, and is not a sentence-by-sentence translation. Opmerkingen S1 Opmerkingen S2 These two tiers are/can be used for remarks on the (signs of) the signers to the left and right respectively. 3. Perl scripts A number of Perl scripts have been created for changing information in all ELAN and/or PFSX files in one folder. They are available at the corpus web site. Be careful using these scripts: they are not tested on other material than that available for the Corpus NGT. Please read the manual carefully, and before application of the script be sure to make a backup of all the ELAN/PFSX files in the folder. The scripts have been tested on the whole set of files in the Corpus NGT that existed in 2008. It is impossible to guarantee successful application for all future ELAN files, or that they can be successfully applied in future versions of the ELAN/PFSX specifications. They presuppose the file naming conventions in the Corpus NGT project; see the manuals for details. Be very careful using the scripts, and make sure to test the resulting files. Knowledge of XML is a prerequisite, so that the input and output of the scripts can be inspected in relation to the script. The following scripts are available: AddLinguisticType.pl Adds a Linguistic Type in all files in a folder. AddTier.pl Adds a tier in all files in a folder. EafCopy.pl Creates EAF files for all media files in a folder. EafCopy2.pl Creates EAF files based on a dummy file, each linked to two media files (body view of each signer). EafCopy3.pl

4/9 Creates EAF files based on a dummy file, each linked to four media files (body and face fiews of each signer). PfsxCopy.pl Creates PFSX files based on a dummy file for all media files in a folder. CorrectAnnotation.pl Changes annotation values in all EAF files in a folder on the basis of a text file, in which the first column contains the existing value of the annotation and the second column the new value. Attention: the script applies to all tiers in a file. Possibly, these and comparable functionalities for managing larger numbers of EAF files will be built into ELAN in the future. 4. Annotation conventions 4.1 Introduction The glosses in the annotation files in the Corpus NGT are intended to indicate the exact start and end time of the signs, as well as to refer to a lexicon. Thus, the glosses are not actual translations; in the ideal case they are pointers to lemmas in a lexicon. Because of the fact that there is no common orthography for sign language nor a practical, much used phonetic notation system, Dutch words have been used as a reference. They approach (one of) the meaning(s) of the signs; however, the real meanings of the sign forms are described in the lexicon, not by the gloss. Exceptions to this rule are non-lexicalised forms that, in the gloss, are preceded by a @-character (see under #4 below). Although it was our intention to use glosses referring to a lexicon, for reasons of efficiency it was not possible to always consult the lexicons of the Dutch Sign Centre (NGc) on DVD or on the internet. Because of this, the glosses in the first release will contain many inconsistencies; the user of the annotations be aware of this. Typos and spelling mistakes have been corrected as much as possible in all of the files. However, only part of the material has been checked by a second annotator. It is, therefore, expected that many files contain a number of inconsistencies as well as interpretation differences and mistakes. When the files are used, these can be repaired as much as possible, although no standard procedure for this is as yet available. This will need to be developed in the near future. For further information, users can contact the corpus managers (corpusngt@let.ru.nl). The glosses are only related to manual activity, not to body or facial activity, even though body and face often express meaning). E.g. when the signer makes a manual sign accompanied by a head shake, only the manual sign has been referred to in the gloss, not the negation. 4.2 Gloss conventions

5/9 1. All signs are provided with glosses. A gloss is usually one Dutch word. Glosses are in capitals. o E.g.: HOND (DOG) 2. There is a separate tier for each hand. If a sign was made with the left hand, this is in the GlosL tier; if a sign is made with the right hand, it is annotated in the GlosR tier. If a sign is made with both hands, this is both in the GlosL tier and the GlosR tier. o E.g.: left hand right hand The latter holds irrespectively of whether only one or both hands move or only one hand moves (so also for the NGT sign for KOFFIE (COFFEE). 3. Some signs have a fixed form and meaning but cannot be labelled with one Dutch word. In those cases, the signs are annotated by a (fixed) combination of Dutch words (where possible, the descriptions of the Dutch Sign Center were used). These words are linked by underscores. o E.g.: FLUITJE_VAN_EEN_CENT (PIECE_OF_CAKE) NOG_NIET (NOT_YET) HET_EENS_ZIJN_MET (AGREE_WITH) 4. For some signs it appeared to be very difficult to find a good equivalent Dutch word or fixed word combination, mainly because these signs combine many meanings simultaneously. In these cases a description of the meaning of the sign was given, in small letters and preceded by the @ character. In general, this concerns less lexicalized or morphologically complex productive forms. o E.g.: @schapen lopen de heuvel op. (@sheep go up the hill.) 5. Glosses have been assigned as consequently as possible (same sign, same gloss). However, some signs differ only because of a different mouthing. In those cases different glosses were used, especially when these signs were separate items in the lexicons. o E.g.: BROER vs. ZUS ( BROTHER vs. SISTER ): different glosses because of the different mouthings. 6. The start and end of each sign have been indicated carefully. The following criteria have been used to determine sign boundaries: A sign starts: at the first frame in which the hand starts to move away from the initial location of the sign to the final location of the sign; or (in case the hand does not move through space): at the first frame in which the handshape starts to change, e.g. closing the hand in the sign for MAN ; or

6/9 (in case the hand does not move through space and the handshape does not change): at the first frame in which the orientation of the hand starts to change, e.g. turning the hand in the sign for OVERLEDEN ( PASS_AWAY ); A sign ends: at the first frame in which the handshape starts to change after the sign was finished; or at the first frame in which the hand starts to move away from the final location of the sign. 7. In two-handed signs the hands do not always move in exactly the same way. Often one hand stays in a particular position after the sign has ended, while the other hand goes on signing the next sign. Or one hand starts to move or change slightly before the other hand does. The exact duration of the sign is indicated for each hand on the GlosL- and GlosR-tiers, independent of the duration of the other hand. 8. Compound signs have been glossed literally, with a ^ character between the parts: o E.g.: ONDERWIJS^PERSOON instead of ONDERWIJZER ( TEACH^PERSON instead of TEACHER ) 9. If a sign is glossed as a verb, it is always done in the infinite form: o E.g.: LOPEN instead of LOOPT or LIEP ( WALK instead of WALKS or WALKED ) 10. There is a sign in NGT that is used to draw the addressee s attention. That sign is glossed as HEE ( HEY ). 11. There is a sign that is difficult to translate or describe because it has many meanings/functions. It has the following form: the hand palm(s) are oriented upwards; sometimes there is also a (small) movement upwards or downwards. This sign is glossed as PO (Palm van de hand Omhoog) (Palm of hand Up).

7/9 o E.g.: 12. Pointing signs carry the gloss INDEX. If the signer points to him/herself, the gloss is: INDEX-1. If a signer consecutively points to different locations, separate glosses INDEX are used. However, if a signer uses an arc movement to point to several signs, there is one gloss INDEX ; this gloss spans the duration of the arc movement in the sign. If the signer consecutively points several times to the same location, this is annotated with one gloss INDEX, that has the duration of the sequence of pointing signs to that same location. 13. There are signs made with index and middle finger, with a meaning of togetherness. These have been glossed as follows: o WIJ_TWEEEN (THE_TWO_OF_US) o JULLIE_TWEEEN (THE_TWO_OF_YOU) o ZIJ_TWEEEN (THE_TWO_OF_THEM) 14. Signs for numbers have been glossed in digits. E.g.: o 312, not: DRIEHONDERD_TWAALF (THREE_HUNDERD_TWELVE) o 2e, not: TWEEDE (SECOND) 15. Counting while using a number on the non-dominant hand has been glossed as TEN_EERSTE, TEN_TWEEDE, etc. (FIRST, SECOND), not TEN_2e (SECOND) or =ten tweede (=second) or INDEX. These glosses have been used on both the left hand and right hand tiers. 16. If a signer uses fingerspelling (e.g. a name) all spelled letters are glossed, preceded by a # character:

8/9 o E.g.: #INGE 17. If a signer fingerspells only one letter but simultaneously mouths the word (e.g. a name), only the spelled letter has been glossed, preceded by the # character. o E.g.: Inge 18. In some cases a signer expresses two things simultaneously in one manual action (e.g. a combination of two signs). In those cases the glosses of both signs have been annotated, separated by a + character. o E.g.:

9/9 19. If a gloss is preceded by a question mark, this indicates that the annotator was not sure in his/her interpretation of the manual activity and that a second opinion is necessary. o E.g.:?BOEK (?BOOK) 20. An annotation containing only two question marks indicates that the annotator has recognized manual activity as a sign, but does not know the sign and has not been able to find it in any lexicon. o E.g.:?? 21. If a gloss is preceded by a ~ character, this indicates that the annotator has clearly recognized the sign, but that the sign is not well-formed (e.g. signed sloppily). This character can even have been used in case the sign is actually wrong (or even a different sign). o E.g.: ~AMSTERDAM (while the signer actually pronounces the sign as MAAL or KEER ( TIMES ), i.e. without repetition of the contacting movement while moving from high to low in space). 22. For a few frequent signs for well-known terms with long glosses abbreviations are used: CI (Cochleair Implantant) NmG (Nederlands ondersteund met gebaren; Sign Supported Dutch) Contact For more information, contact the corpus managers at corpusngt@let.ru.nl. Also see the corpus website for new information: http://www.corpusngt.nl.