Annotation in Language Documentation
|
|
- Myron Owen
- 8 years ago
- Views:
Transcription
1 Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE
2 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
3 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
4 1) Language Documentation
5 1) Language Documentation
6 1) Language Documentation New subfield of linguistics (Himmelmann 1998): documentary linguistics, with language documentations as results Triggered by language endangerment, enabled by technical / digital revolution Different from Language Description: In addition to the Boas ian triad (grammar, dictionary, text collection): corpora of annotated multimedia-data
7 1) Language Documentation A modern Langague Documentation (LD) cosists of a corpus of primary data (audio & vídeo recordings) of utterances and texts of a broad spectrum of genres and domains Annotation accompanies the utterances A LD is digital and sustainable (metadata, open standards, archiving, maintenance) It is generally accessible, e.g. via the internet
8 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
9 2) Data and Annotation S E SSION Metadata (describe the event and the respective Data) Videorecording Audiorecording Annotation PrimArY data SeCOndArY data Transcription: Orthographical / Phonolog.... Word-by-word / Idiomatic Translation... (linguistic / ethnograph. comment... ) (Morpheme-Glosses... )...
10 2) Data and Annotation Data Data is always data FOR something, or at least OF something usually it is a systematic representation of physical states and events ( facts used FOR a scientific argument) In LD, primary data is a direct rendering or result of communicative (speech) events, for instance a written text or, in particular, an audio/video recording of a speech event
11 2) Data and Annotation Annotation Annotation of data is a symbolic representation of properties of the state/event represented in the data In LD, the most common and basic types of (primary) annotation are a transcription and a translation of the expressions represented in primary data (e.g., a/v recording)
12 2) Data and Annotation Annotation = Secondary Data Represents symbolically properties of represented in Primary Data Direct Measurement / Rendering / Result of REALITY (Communicative Events)
13 2) Data and Annotation Global vs. unit-oriented Annotation Global or holistic annotation represents properties of the event as a whole and is in LD part of the metadata Unit-oriented annotation refers to specific parts of the data, in particular, utterances of individual sentences or words or sounds etc. We speak of individual annotations (plural)
14 2) Data and Annotation Secondary and derived data If unit-oriented annotation is directly based on primary data (such as a written text or a audio or video recording), then it is secondary data Annotation commenting on previous annotation is tertiary data, and so forth recursively In sum, all unit-o. annotation is derived data There are other types of derived data (lexicon...)
15 2) Data and Annotation Time-aligned annotation Annotation of a media file is time-aligned anotation if each piece of annotation is explicitly associated with the corresponding chunk (time-span, segment) of the media file Time-linking is the activity and result of specifying the time-alignment of each annotation associated with a certain chunk in the media file
16 2) Data and Annotation This is usually done by using the time marks Time marks: the start/end times of each chunk Segmenting (of a media file): identification of relevant chunks and their time marks Work-flow: segmenting adding annotation Older unit-oriented annotation can later be time-aligned, but this is very labour-intensive (but now see web-maus from CLARIN/BAS)
17 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
18 3) Types and Interdependencies of Annotations Linguistic types of annotations Annotations differ according to the types of properties of the speech event that are represented in the annotation Annotations can be phonetic, phonological, morphological, syntactic, semantic, pragmatic, (possibly others), and on each level they can focus on the units, or on structures of units, or on relations that hold among units, etc.
19 3) Types and Interdependencies of Annotations Coverage of annotation Basic annotation: only transcriptions, translations and optionally notes, on a sentence / clause / intonational unit level Basic glossing: additionally information on individual morphs: a gloss (indication of meaning or function) and perhaps a part-of-speech tag Advanced glossing: one or several of additional levels, from phonetic to pragmatic (for instance, a prosodic transcription, or annotation of the syntactic structure, of grammatical relations, etc.)
20 3) Types and Interdependencies of Annotations Most often used format in lang. description: Interlinear Morpheme Translations / Glossing (standard glossing ) C. Lehmann: Interlinear Morphemic Glossing. In Morphology (2004, first version 1982) Leipzig Glossing Rules = MPI- EVA (B. Comrie, M. Haspelmath) & Univ. Leipzig (B. Bickel), 2008
21 3) Types and Interdependencies Example: of Annotations time-o ne veni-a-t fear-1.sg NEG.VOL come-sbjv.pres-3.sg I am afraid he might come
22 3) Types and Interdependencies of Annotations Problems: Theory-specific (item-and-arrangement, not itemand-process nor word-and-paradigm) Mixes morphology and syntax Problems with synthetic word forms timeo: 1P, SG, IND, ACT, PRES where PRES? (Ø) Analytical word forms (esp. discontinuous) What do the labels designate? Meaning? Categories? Functions? Often undefined.
23 3) Types and Interdependencies of Annotations Hans-Heinrich Lieb & Sebastian Drude Advanced Glossing: A Language Documentation Format (DOBES Working Paper, November 2000) Glossing1.pdf
24 Advanced Glossing (AG): a syntactic glossing table
25 Advanced Glossing (AG): a morphological glossing table
26 Glossing table AG: A Glossing Table a l i n e a c e l l a h o l i s t i c l i n e a h o l i s t i c l i n e is a list
27 AG: A Glossing Glossing Glossing table Comment General comment is linked to....
28 AG: Syntactic and Morphological Syntactic glossing of a sentence Glossings of a sentence Morphological glossing of a sentence is a glossing of.... M. glossing of word 1 M. glossing of word 2 M. glossing of word 3
29 AG: Glossing of a Text Syntactic glossing of a sentence Glossings of the sentences Glossing table Syntactic a and morphological l i n eglossings of sentence Comment General General comment on the text a c e l l.... Morphological glossing of a sentence.... is a glossing of M. glossing of word 1 M. glossing of word 2 M. glossing of word Syntactic and morphological glossings of sentence 2 is a list Raw data Syntactic and morphological glossings of sentence 3 (Other components)
30 AG: The number line
31 AG: Phonetic Form and Intonation
32 AG: Phonological Forms and Intonation
33 AG: Orthographical Base Forms
34 AG: Lexical Categories and Form Categories
35 AG: Meanings and Semantic Effects
36 AG: Constituent Structure and Relations
37 AG: Orthographical Unit and Meaning
38 3) Types and Interdependencies of Annotations Time-linked annot. for sentence-utterances Other dependent sentence-annotations Subdivision into annotat. for syntactic units (can be internally time-aligned or not) Dependent syntactic-unit-annotations Further subdivision into annotations f. morphs (hardly possible to time-align internally) Dependent morph-annotations
39 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
40 4) Annotation Tools Transcriber Tool for the segmentation and transcription of audio files Pros: Compatible with MAC, Windows & Linux; very easy to use; produces simple XML-files Cons: No Unicode input possible; only one line of annotation; no video; no lexicon, outdated (new version not tested)
41 Transcriber
42 4) Annotation Tools ELAN Tool for the complex annotation of audio and video files Pros: Compatible with MAC, Windows & Linux; audio and multiple video files; unlimited tiers for different speakers; state-of-the-art; wide user community; XML output (but complex) Cons: Complex tool for beginners (but now: easier transcription mode); no lexicon (yet)
43 ELAN
44 ELAN
45 4) Annotation Tools Toolboox Text-oriented general database tool for linguistic fieldwork with lexicon and texts Pros: Flexible and powerful; Export to different formats (incl. XML); therefore easy to integrate with other tools; many users Cons: Too flexible; poor data format Standard Format ; complex to set up; tricky on MAC/Linux; no video and no time-aligning; at end of lifecycle; produced by SIL
46 Toolbox
47 Toolbox
48 4) Annotation Tools FLEX Extensive linguistic database tool for linguistic fieldwork with lexicon and texts Pros: Powerful and well-designed; inbuilt ontology and analysis tools; growing user community Cons: Not flexible (8 tiers); one huge XML database with no good import or export function for texts; Windows only; difficult to configure; no audio, no video, no time-alignment; produced by SIL
49 FLEX
50 FLEX
51 4) Annotation Tools Other tools Praat for segmenting, best for phonetic annotation. CLAN does audio and video annotation, in the CHAT or CA (Conversation Analysis) formats, for child language data (CHILDES project). ANVIL seems to be similar to ELAN (not tested). The EXMARaLDA Partitur Editor (U. Hamburg) is widely used for discourse analysis. Audiamus and Eopas (N. Thieberger) organize (not create) annotation. Poio (developed in the context of CLARIN, API) There are several others.
52 4) Annotation Tools Transcriber ELAN Toolbox FLEX Complexity Easy Complex, w. easier modes Complex to configure Audio Yes Yes No (can play) No Video No Yes No No Complex Tiers 1 per speaker Unlimited Unlimited Fixed: 8 Lexicon interop., automatic glossing No No (is planned) Unicode No input Yes Yes Yes Data format Simple XML Compl. XML Faulty TXT XML database Interoperability Good Fair Good Bad User community / support Life cycle Small?, no support? Old (but new version 2011) Large, good support Constantly developed Yes Large, fair support Not officially supported, old Yes Small, good support New, being developed
53 4) Annotation Tools Transcriber ELAN Toolbox FLEX Complexity Easy Complex, w. easier modes Complex to configure Audio Yes Yes No (can play) No Video No Yes No No Complex Tiers 1 per speaker Unlimited Unlimited Fixed: 8 Lexicon interop., automatic glossing No No (is planned) Unicode No input Yes Yes Yes Data format Simple XML Compl. XML Faulty TXT XML database Interoperability Good Fair Good Bad User community / support Life cycle Small?, no support? Old (but new version 2011) Large, good support Constantly developed Yes Large, fair support Not officially supported, old Yes Small, good support New, being developed
54 4) Annotation Tools Transcriber ELAN Toolbox FLEX Complexity Easy Complex with easier modes Complex to configure Audio Yes Yes No (can play) No Video No Yes No No Complex Tiers 1 per speaker Unlimited Unlimited Fixed: 8 Lexicon interop., automatic glossing No No (is planned) Unicode No input Yes Yes Yes Data format Simple XML Compl. XML Faulty TXT XML database Interoperability Good Fair Good Bad User community / support Life cycle Small?, no support? Old (but new version 2011) Large, good support Constantly developed Yes Large, fair support Not officially supported, old Yes Small, good support New, being developed
55 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
56 5) Annotation data formats Transcriber *.TRS
57 5) Annotation data formats ELAN *.EAF
58 5) Annotation data formats Toolbox standard format *.SDB, *.TBT, *.SF
59 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations 4. Annotation Tools (overview and comparison) 5. Annotation data formats (TRS, EAF, SF) 6. Standing challenges
60 6) Standing challenges No standardized conventions for layers of linguistic annotation Problems with interlinear morpheme glosses Unclear status / interpretation of labels Different labels for same categories Different definitions for same categories based on different theories CLARIN: partial solution: ISOcat CLAVAS
61 Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE
62 6) Standing challenges EUROTYP: ca. 550 abbreviations of terms : morphological categories 246 lexical word classes 114 Syntactic relations 56 Syntactic constituent categories 27 Semantic roles 16 Word order 16 Sentence types 2 Varieties and other 6+2 Unspecific or unclear 78
63 6) Standing challenges Inflection: analytical word forms Where is PLUSQUAMPERFEKT to be annotated? moni -t ask -PART.PF.PASS -us er -a -m PASS -IND.PAST -1.SG.ACT -NOM.SG.M monitus eram (analytical form): 1P, Sg, Ind, Pass, Plpf, Nom V, Masc V
Annotation tool Toolbox how to gloss/annotate in Toolbox. Regensburg DOBES summer school Language Documentation Sebastian Drude 2011-09
Annotation tool Toolbox how to gloss/annotate in Toolbox Regensburg DOBES summer school Language Documentation Sebastian Drude 2011-09 Topics 1. Data and Annotation (Theory) 2. Annotation Tools (Overview
More informationSustainable Solutions for Endangered Languages Data: The Language Archive
Charting Vanishing Voices: A Collaborative Workshop to Map Endangered Oral Cultures World Oral Literature Project 2012 Workshop CRASSH, Cambridge Sustainable Solutions for Endangered Languages Data: The
More informationTranscribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project good@eva.mpg.de
Transcribing and annotating audio and video: Jeff Good MPI EVA and the Rosetta Project good@eva.mpg.de Goals of presentation Discuss basic concepts of audio and video transcription and annotation Illustrate
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationCoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne
CoLang 2014 Data Management and Archiving Course Session 2 Nick Thieberger University of Melbourne Quiz In a morning recording session you recorded two speakers, each telling a story, then recorded your
More informationElan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files
Elan Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files Elan sources of information Developed by Max Planck Institute for
More informationTowards Web Services for Speech Recording and Annotation
Towards Web Services for Speech Recording and Annotation Christoph Draxler draxler@phonetik.uni-muenchen.de BAS Bavarian Archive for Speech Signals LMU Munich BAS hosted by University of Munich (LMU) Florian
More informationThe Rise of Documentary Linguistics and a New Kind of Corpus
The Rise of Documentary Linguistics and a New Kind of Corpus Gary F. Simons SIL International 5th National Natural Language Research Symposium De La Salle University, Manila, 25 Nov 2008 Milestones in
More informationComputerized Language Analysis (CLAN) from The CHILDES Project
Vol. 1, No. 1 (June 2007), pp. 107 112 http://nflrc.hawaii.edu/ldc/ Computerized Language Analysis (CLAN) from The CHILDES Project Reviewed by FELICITY MEAKINS, University of Melbourne CLAN is an annotation
More informationFrom Fieldwork to Annotated Corpora: The CorpAfroAs project
From Fieldwork to Annotated Corpora: The CorpAfroAs project Amina Mettouchi & Christian Chanard (University of Nantes & Institut Universitaire de France) (CNRS-LLACAN, Villejuif) * Introduction In the
More informationThe Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)
The Language Archive at the Max Planck Institute for Psycholinguistics Alexander König (with thanks to J. Ringersma) Fourth SLCN Workshop, Berlin, December 2010 Content 1.The Language Archive Why Archiving?
More informationInqScribe. From Inquirium, LLC, Chicago. Reviewed by Murray Garde, Australian National University
Vol. 6 (2012), pp.175-180 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4508 InqScribe From Inquirium, LLC, Chicago Reviewed by Murray Garde, Australian National University 1. Introduction.
More informationUsing ELAN for transcription and annotation
Using ELAN for transcription and annotation Anthony Jukes What is ELAN? ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationTowards a Cross-Linguistic Production Data Archive: Structure and Exploration*
Towards a Cross-Linguistic Production Data Archive: Structure and Exploration* Michael Götze 1, Stavros Skopeteas 1, Torsten Roloff 1, and Ruben Stoel 2 1 SFB 632 Information Structure, Institut für Linguistik,
More informationUser Guide for ELAN Linguistic Annotator
User Guide for ELAN Linguistic Annotator version 4.1.0 This user guide was last updated on 2013-10-07 The latest version can be downloaded from: http://tla.mpi.nl/tools/tla-tools/elan/ Author: Maddalena
More informationTranscription Format
Representing Discourse Du Bois Transcription Format 1. Objective The purpose of this document is to describe the format to be used for producing and checking transcriptions in this course. 2. Conventions
More informationArchiving and the work flow of field work
Archiving and the work flow of field work Nicholas Thieberger Pacific and Regional Archive for Digital Sources in Endangered Cultures Language Archives An integral part of language documentation The locus
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationStudy Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
More informationCarla Simões, t-carlas@microsoft.com. Speech Analysis and Transcription Software
Carla Simões, t-carlas@microsoft.com Speech Analysis and Transcription Software 1 Overview Methods for Speech Acoustic Analysis Why Speech Acoustic Analysis? Annotation Segmentation Alignment Speech Analysis
More informationSPRING SCHOOL. Empirical methods in Usage-Based Linguistics
SPRING SCHOOL Empirical methods in Usage-Based Linguistics University of Lille 3, 13 & 14 May, 2013 WORKSHOP 1: Corpus linguistics workshop WORKSHOP 2: Corpus linguistics: Multivariate Statistics for Semantics
More informationTranscriptions in the CHAT format
CHILDES workshop winter 2010 Transcriptions in the CHAT format Mirko Hanke University of Oldenburg, Dept of English mirko.hanke@uni-oldenburg.de 10-12-2010 1 The CHILDES project CHILDES is the CHIld Language
More informationAphasiaBank. Audrey Holland, Margaret Forbes, Davida Fromm & Brian Macwhinney Brian MacWhinney
AphasiaBank Audrey Holland, Margaret Forbes, Davida Fromm & Brian Macwhinney Brian MacWhinney 4 in the title-- but... Brian is our leader Goal of AphasiaBank To create a shared database of multimedia interactions
More informationData at the SFB "Mehrsprachigkeit"
1 Workshop on multilingual data, 08 July 2003 MULTILINGUAL DATABASE: Obstacles and Opportunities Thomas Schmidt, Project Zb Data at the SFB "Mehrsprachigkeit" K1: Japanese and German expert discourse in
More informationSTEPS IN LANGUAGE DOCUMENTATION AND REVITALIZATION JACK MARTIN NICK THIEBERGER
STEPS IN LANGUAGE DOCUMENTATION AND REVITALIZATION JACK MARTIN NICK THIEBERGER Steps in Documentation Steps in Community Relations STEPS IN LANGUAGE DOCUMENTATION Ideally: as rich as possible a set of
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationTo: MesoSpace team Subject: ELAN - a test drive version 3 From: Jürgen (v1), with additions by Ashlee Shinn (v2-v3) Date: 9/19/2009
To: MesoSpace team Subject: ELAN - a test drive version 3 From: Jürgen (v1), with additions by Ashlee Shinn (v2-v3) Date: 9/19/2009 This document - I witnessed (from a decidedly peripheral position) the
More informationDocumentary linguistics workshop focusing on working with speaker-linguists and resource development
DocLing 2013 Documentary linguistics workshop focusing on working with speaker-linguists and resource development 11-16 February 2013 Research Institute for Languages and Cultures of Asia and Africa, Tokyo
More informationThe Language Archiving Technology solutions for sustainable data from digital fieldwork research
The Language Archiving Technology solutions for sustainable data from digital fieldwork research Sebastian Drude, Daan Broeder and Paul Trilsbeek Introduction Since the late 1990s, the Technical Group
More informationSLDTC: The Sign Language Documentation Training Center
SLDTC: The Sign Language Documentation Training Center ICLDC 27 February 15 Honolulu, HI Jan Fried Samantha Rarrick Brittany Wilson University of Hawai i Presentation will Include: Background & Origin
More informationLAMUS & LAT Archiving software
LAMUS & LAT Archiving software Daan Broeder Max-Planck Institute for Psycholinguistics The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands The Language Archive - 2011
More informationTowards a Data Model for the Universal Corpus
Towards a Data Model for the Universal Corpus Steven Abney University of Michigan abney@umichedu Steven Bird University of Melbourne and University of Pennsylvania sbird@unimelbeduau Abstract We describe
More informationFEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE
FEATURES FOR AN INTERNET ACCESSIBLE CORPUS OF SPOKEN TURKISH DISCOURSE Şükriye RUHİ sukruh@metu.edu.tr Derya ÇOKAL KARADAŞ cokal@metu.edu.tr Middle East Technical University THE METU SPOKEN TURKISH DISCOURSE
More informationTHE BACHELOR S DEGREE IN SPANISH
Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text
More informationLEXUS: a web based lexicon tool
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Content Max Planck Institute Archive of linguistic resources Tool support (archiving
More informationUniversity of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus
University of Massachusetts Boston Applied Linguistics Graduate Program APLING 601 Introduction to Linguistics Syllabus Course Description: This course examines the nature and origin of language, the history
More informationSTANDARDS IN SPOKEN CORPORA
Thomas Schmidt, Programmbereich Mündliche Korpora STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken corpora at the SFB 538 (2) Interoperability for spoken language corpora (3) Standards for spoken
More information209 THE STRUCTURE AND USE OF ENGLISH.
209 THE STRUCTURE AND USE OF ENGLISH. (3) A general survey of the history, structure, and use of the English language. Topics investigated include: the history of the English language; elements of the
More informationScandinavian Dialect Syntax Transnational collaboration, data collection, and resource development
Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Janne Bondi Johannessen, Signe Laake, Kristin Hagen, Øystein Alexander Vangsnes, Tor Anders Åfarli, Arne
More informationTranscribing and annotating spoken language with EXMARaLDA
Transcribing and annotating spoken language with EXMARaLDA Thomas Schmidt Sonderforschungsbereich 538 Mehrsprachigkeit University of Hamburg, Max Brauer-Allee 60, D-22765 Hamburg thomas.schmidt@uni-hamburg.de
More informationSt. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix
Course Credit In-service points St. Petersburg College RED 4335/Reading in the Content Area Florida Reading Endorsement Competencies 1 & 2 Reading Alignment Matrix Text Rule 6A 4.0292 Specialization Requirements
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationAn exchange format for multimodal annotations
An exchange format for multimodal annotations Thomas Schmidt, University of Hamburg Susan Duncan, University of Chicago Oliver Ehmer, University of Freiburg Jeffrey Hoyt, MITRE Corporation Michael Kipp,
More informationPoio API - An annotation framework to bridge Language Documentation and Natural Language Processing
Poio API - An annotation framework to bridge Language Documentation and Natural Language Processing Peter Bouda, Vera Ferreira, António Lopes Centro Interdisciplinar de Documentação Linguística e Social
More informationEssentials of Language Documentation
Essentials of Language Documentation Trends in Linguistics Studies and Monographs 178 Editors Walter Bisang Hans Henrich Hock Werner Winter Mouton de Gruyter Berlin New York Essentials of Language Documentation
More informationDAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR
DAM-LR at the INL Archive Formation and Local INL Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl Introducing Remco van Veenendaal Project manager DAM-LR Acting project manager Dutch HLT Agency
More informationPONTIFICIA UNIVERSIDAD CATÓLICA DEL PERÚ - PUCP FIELD SCHOOL PROGRAM IN PERU LINGUISTIC SUMMER SCHOOL 2014 SEASON
ACADEMIC OFFICE OF INSTITUTIONAL RELATIONS PONTIFICIA UNIVERSIDAD CATÓLICA DEL PERÚ - PUCP FIELD SCHOOL PROGRAM IN PERU LINGUISTIC SUMMER SCHOOL 2014 SEASON GENERAL INFORMATION Course: Linguistic Summer
More informationTools & Resources for Visualising Conversational-Speech Interaction
Tools & Resources for Visualising Conversational-Speech Interaction Nick Campbell NiCT/ATR-SLC Keihanna Science City, Kyoto, Japan. nick@nict.go.jp Preamble large corpus data examples new stuff conclusion
More informationTechnology in language documentation
Technology in language documentation Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Documenting oral traditions in the non-western world Language (archiving) technology Language documentation:
More informationA Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania
A Short Introduction to Transcribing with ELAN Ingrid Rosenfelder Linguistics Lab University of Pennsylvania January 2011 Contents 1 Source 2 2 Opening files for annotation 2 2.1 Starting a new transcription.....................
More informationWhy major in linguistics (and what does a linguist do)?
Why major in linguistics (and what does a linguist do)? Written by Monica Macaulay and Kristen Syrett What is linguistics? If you are considering a linguistics major, you probably already know at least
More informationMultilingual, Multiperson, Multimedia: Linking Audio-Visual with Text Material in Language Documentation.
Multilingual, Multiperson, Multimedia: Linking Audio-Visual with Text Material in Language Documentation. Patrick McConvell AIATSIS 1. Introduction Language documentation for endangered and Indigenous
More informationWhat Is Linguistics? December 1992 Center for Applied Linguistics
What Is Linguistics? December 1992 Center for Applied Linguistics Linguistics is the study of language. Knowledge of linguistics, however, is different from knowledge of a language. Just as a person is
More informationContemporary Linguistics
Contemporary Linguistics An Introduction Editedby WILLIAM O'GRADY MICHAEL DOBROVOLSKY FRANCIS KATAMBA LONGMAN London and New York Table of contents Dedication Epigraph Series list Acknowledgements Preface
More informationA UNIVERSAL DATA MODEL FOR LINGUISTIC ANNOTATION TOOLS. Scott Farrar
A UNIVERSAL DATA MODEL FOR LINGUISTIC ANNOTATION TOOLS By Scott Farrar Paper presented at 2006 E-MELD Workshop on Digital Language Documentation Lansing, MI. June 20-22, 2006 Please cite this paper as:
More informationMA APPLIED LINGUISTICS AND TESOL
MA APPLIED LINGUISTICS AND TESOL Programme Specification 2015 Primary Purpose: Course management, monitoring and quality assurance. Secondary Purpose: Detailed information for students, staff and employers.
More informationAn Overview of Applied Linguistics
An Overview of Applied Linguistics Edited by: Norbert Schmitt Abeer Alharbi What is Linguistics? It is a scientific study of a language It s goal is To describe the varieties of languages and explain the
More informationUNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH
UNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH Instructor: Dr. Alicia Pousada Course Title: Study of language Course Number: INGL 4205 Number of Credit Hours:
More informationIntroduction till transcription using CHAT (with linking of audiofiles)
Introduction till transcription using CHAT (with linking of audiofiles) Victoria Johansson Humanities Lab, Lunds universitet it-pedagog@humlab.lu.se Innehåll 1 Inledning 2 2 CHAT 2 3 Transcription 2 3.1
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationPONTIFICIA UNIVERSIDAD CATÓLICA DEL PERÚ - PUCP FIELD SCHOOL PROGRAM IN PERU AMAZONIAN LINGUISTICS SUMMER SCHOOL 2015 SEASON
ACADEMIC OFFICE OF INSTITUTIONAL RELATIONS PONTIFICIA UNIVERSIDAD CATÓLICA DEL PERÚ - PUCP FIELD SCHOOL PROGRAM IN PERU AMAZONIAN LINGUISTICS SUMMER SCHOOL 2015 SEASON GENERAL INFORMATION Course: Amazonian
More informationSignLEF: Sign Languages within the European Framework of Reference for Languages
SignLEF: Sign Languages within the European Framework of Reference for Languages Simone Greiner-Ogris, Franz Dotter Centre for Sign Language and Deaf Communication, Alpen Adria Universität Klagenfurt (Austria)
More informationHow To Teach Reading
Florida Reading Endorsement Alignment Matrix Competency 1 The * designates which of the reading endorsement competencies are specific to the competencies for English to Speakers of Languages (ESOL). The
More informationLINGUISTIC PROCESSING IN THE ATLAS PROJECT
LINGUISTIC PROCESSING IN THE ATLAS PROJECT Leonardo LESMO, Alessandro Mazzei, Daniele Radicioni Interaction Models Group Dipartimento di Informatica e Centro di Scienze Cognitive Università di Torino {lesmo,mazzei,radicion}@di.unito.it
More informationCOURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014
COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical
More informationCourse Description (MA Degree)
Course Description (MA Degree) Eng. 508 Semantics (3 Credit hrs.) This course is an introduction to the issues of meaning and logical interpretation in natural language. The first part of the course concentrates
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationCLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationSign language transcription conventions for the ECHO Project
Sign language transcription conventions for the ECHO Project Annika Nonhebel, Onno Crasborn & Els van der Kooij University of Nijmegen Version 9, 20 Jan. 2004 URL: http://www.let.kun.nl/sign-lang/echo/docs/transcr_conv.pdf
More informationA Database Tool for Research. on Visual-Gestural Language
A Database Tool for Research on Visual-Gestural Language Carol Neidle Boston University Report No.10 American Sign Language Linguistic Research Project http://www.bu.edu/asllrp/ August 2000 SignStream
More informationA high speed transcription interface for annotating primary linguistic data
A high speed transcription interface for annotating primary linguistic data Mark Dingemanse, Jeremy Hammond, Herman Stehouwer, Aarthy Somasundaram, Sebastian Drude Max Planck Institute for Psycholinguistics
More informationToolbox 1! Susan Gehr!! susan@gehr.info! Cell/text (707) 599-2719!
Toolbox 1! Susan Gehr!! susan@gehr.info! Cell/text (707) 599-2719! With gratitude! l Albert Bickford, Toolbox instructor for InField 2008, 2010 & CoLang 2012! l Neil Brinneman, Shoebox instructor 2003!
More informationComparison of multimodal annotation tools workshop report
Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion (ISSN 1617-1837) Ausgabe 7 (2006), Seite 99-123 (www.gespraechsforschung-ozs.de) Comparison of multimodal annotation tools workshop report
More informationDEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES
Teaching English with Technology, vol. 3, no. 1, pp. 3-12, http://www.iatefl.org.pl/call/callnl.htm 3 DEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES by Alejandro Curado University
More informationStephen Reder Kathryn Harris Kristen Setzler Portland State University Portland, Oregon, United States
The Multimedia Adult Learner Corpus (published in TESOL Quarterly (2003), v. 37, # 3 pp. 546-557) Posted to this website with the permission of TESOL Quarterly and the authors. Stephen Reder Kathryn Harris
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationMASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS
University of Cambridge: Programme Specifications Every effort has been made to ensure the accuracy of the information in this programme specification. Programme specifications are produced and then reviewed
More informationHow To Teach English To Other People
TESOL / NCATE Program Standards STANDARDS FOR THE ACCREDIATION OF INITIAL PROGRAMS IN P 12 ESL TEACHER EDUCATION Prepared and Developed by the TESOL Task Force on ESL Standards for P 12 Teacher Education
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationEuropean Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students
European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students October, 2012 European Masters Program in LCT Module Handbook Page 1 Contents 1 What is
More informationAutomation of metadata processing
Automation of metadata processing CLARIN-Conference in Wroclaw, Poland, 15-17, Octobre Except where otherwise noted, content on this poster is licensed under a Creative Commons Attribution 4.0 International
More informationEfficient diphone database creation for MBROLA, a multilingual speech synthesiser
Efficient diphone database creation for, a multilingual speech synthesiser Institute of Linguistics Adam Mickiewicz University Poznań OWD 2010 Wisła-Kopydło, Poland Why? useful for testing speech models
More informationWeSay, A Tool for Collaborating on Dictionaries with Non-Linguists
Vol. 6 (2012), pp. 181-186 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4507 WeSay, A Tool for Collaborating on Dictionaries with Non-Linguists From Payap Language Software Reviewed by Ross
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationENGLISH LANGUAGE. A Guide to co-teaching The OCR A and AS level English Language Specifications. A LEVEL Teacher Guide. www.ocr.org.
Qualification Accredited Oxford Cambridge and RSA A LEVEL Teacher Guide ENGLISH LANGUAGE H470 For first teaching in 2015 A Guide to co-teaching The OCR A and AS level English Language Specifications Version
More informationManaging large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT
More informationDie Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationThe World Atlas of Language Structures & Follow-up notes
November 2007 Workshop on the Feasibility of a Web-based Database of the Syntactic Structures of the World s Languages The World Atlas of Language Structures & Follow-up notes Hans-Jörg Bibiko Max Planck
More informationInField 2010 Institute on Field Linguistics and Language Documentation University of Oregon
InField 2010 Institute on Field Linguistics and Language Documentation University of Oregon Workshop Coursepack: ELAN 1: Aligning Text to Audio and Video Using ELAN Instructors: Andrea Berez & Christopher
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationThe Knowledge Sharing Infrastructure KSI. Steven Krauwer
The Knowledge Sharing Infrastructure KSI Steven Krauwer 1 Why a KSI? Building or using a complex installation requires specialized skills and expertise. CLARIN is no exception. CLARIN is populated with
More informationLinguistic Resources for OpenHaRT-13
Linguistic Resources for OpenHaRT-13 Zhiyi Song 1, Stephanie Strassel 1, David Doermann 2, Amanda Morris 1 1 Linguistic Data Consortium 2 Applied Media Analysis Introduction to LDC Model LDC is an open,
More informationProcessing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
More informationMorphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning
Morphology Morphology is the study of word formation, of the structure of words. Some observations about words and their structure: 1. some words can be divided into parts which still have meaning 2. many
More informationTESOL Standards for P-12 ESOL Teacher Education 2010. 1 = Unacceptable 2 = Acceptable 3 = Target
TESOL Standards for P-12 ESOL Teacher Education 2010 1 = Unacceptable 2 = Acceptable 3 = Target Standard 1. Language: Candidates know, understand, and use the major theories and research related to the
More informationThings to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
More informationHow To Write The English Language Learner Can Do Booklet
WORLD-CLASS INSTRUCTIONAL DESIGN AND ASSESSMENT The English Language Learner CAN DO Booklet Grades 9-12 Includes: Performance Definitions CAN DO Descriptors For use in conjunction with the WIDA English
More information