LEXUS: a web based lexicon tool



Similar documents
The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Technology in language documentation

Sustainable Solutions for Endangered Languages Data: The Language Archive

LAMUS & LAT Archiving software

The Rise of Documentary Linguistics and a New Kind of Corpus

Central and South-East European Resources in META-SHARE

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

How To Create A Clarin Metadata Infrastructure

ANNEX - Annotation Explorer

CLARIN: Common Language Resources and Technology Infrastructure

Computerized Language Analysis (CLAN) from The CHILDES Project

Annotation in Language Documentation

BUSINESS VALUE OF SEMANTIC TECHNOLOGY

ENTERPRISE DOCUMENTS & RECORD MANAGEMENT

Elan. Complex annotations of video and audio resources Multiple annotation tiers, hierarchically structured Search multiple coded files

Component MetaData Infrastructure

SPRING SCHOOL. Empirical methods in Usage-Based Linguistics

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

WebLicht: Web-based LRT services for German

Processing: current projects and research at the IXA Group

CLARIN project DiscAn :

OpenText Content Hub for Publishers

Survey Results: Requirements and Use Cases for Linguistic Linked Data

The Knowledge Sharing Infrastructure KSI. Steven Krauwer

DATA MANAGEMENT PLAN DELIVERABLE NUMBER RESPONSIBLE AUTHOR. Co- funded by the Horizon 2020 Framework Programme of the European Union

STEPS IN LANGUAGE DOCUMENTATION AND REVITALIZATION JACK MARTIN NICK THIEBERGER

Carla Simões, Speech Analysis and Transcription Software

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

What objects must be associable with an identifier? 1 Catch plus: continuous access to cultural heritage plus

CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne

User Guide for ELAN Linguistic Annotator

Checklist and guidance for a Data Management Plan

CLARIN-NL Third Call: Closed Call

Preserving French Scientific data

Making Content Easy to Find. DC2010 Pittsburgh, PA Betsy Fanning AIIM

Language Documentation and Description

Giuseppe Riccardi, Marco Ronchetti. University of Trento

Amit Sheth & Ajith Ranabahu, Presented by Mohammad Hossein Danesh

A Short Introduction to Transcribing with ELAN. Ingrid Rosenfelder Linguistics Lab University of Pennsylvania

SURFsara Data Services

Technical concepts of kopal. Tobias Steinke, Deutsche Nationalbibliothek June 11, 2007, Berlin

The challenges of becoming a Trusted Digital Repository

How To Manage Your Digital Assets On A Computer Or Tablet Device

Information and documentation The Dublin Core metadata element set

Adding Robust Digital Asset Management to Oracle s Storage Archive Manager (SAM)

Multilingual, Multiperson, Multimedia: Linking Audio-Visual with Text Material in Language Documentation.

Essentials of Language Documentation

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

M3039 MPEG 97/ January 1998

What is Multimedia? Derived from the word Multi and Media

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

Unifying Search for the Desktop, the Enterprise and the Web

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS

What Does Interoperability Mean, Anyway? Toward an Operational Definition of Interoperability for Language Technology

Research Network and Database System (FuD)

DICOM Conformance Statement FORUM

ENABLING SEMANTIC SEARCH IN STRUCTURED P2P NETWORKS VIA DISTRIBUTED DATABASES AND WEB SERVICES

Akoma Ntoso an open document standard for Parliaments

Presentation fiche: ESCO, the forthcoming European Skills, Competencies and Occupations taxonomy

Master of Arts in Linguistics Syllabus

KNOWLEDGE ORGANIZATION

Standards Development. PROS 14/00x Specification 3: Long term preservation formats

BYODs & FAIR Data Stewardship

CERN Document Server

Digital libraries of the future and the role of libraries

Transcription:

LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands

Content Max Planck Institute Archive of linguistic resources Tool support (archiving software and enrichment software) LEXUS and ViCoS Interdisciplinary software development challenges and problems

Max Planck Institute for psycholinguistics Max Planck Gesellschaft 78 research institutes (Germany) 3 outside Germany: 2 Italy (art) 1 The Netherlands (psycholinguistics) The study of mental processes involved in language production, language comprehension and language acquisition, as well as the relation between language, thought, and culture

Max Planck Institute for psycholinguistics Archive for linguistic resources Different types of linguistic material: Endangered languages archive, the European second learner corpus, the National Corpus of Spoken Dutch, gesture corpora, acquisition corpora and language documentation corpora More than 230.000 objects, 25 Tb data: digitized audio and video images annotations Included formats: o.a. XML, HTML, Chat, Toolbox, PDF, Wav, Mpeg1,2,4 Organization: Metadata descriptions, data base Access via the Internet: Meta data search & content search access to these resources is limited and can be made available upon request

Documentation of endangered languages DoBeS = Dokumentation Bedrohter Sprachen DoBeS has two major pillars: language documentation by experienced teams to preserve part of cultural heritage and to help in revitalization where possible creating an organized, accessible and persistent archive

Archive Content: Yélî Dnye (Rossell Island) Multimedia Lexicon Described Corpus Typed Relations within the Lexicon Photos Annotated Media

Tool Support Archiving: IMDI, LAMUS, AMS Data enrichment: ELAN, Synpathy, ADDIT, ANNEX, LEXUS More Language archiving tools: www.lat-mpi.eu

From documentation to exploitation So: now what? Languages have been documented Video and audio is stored in the archive (Part of) the material is annotated Regional archives have been installed at some 10 locations to return the material to the speech communities So, now: exploitation Language is more than video, audio, annotations and lexica Language represents worlds of concepts

LEXUS - Lexicon tool LEXUS Web based lexicon tool Based on the ISO recommendations for linguistic resources LMF : Linguistic Markup Framework (lexicon structure) DCR: Data Category Registries (concept naming) LMF/DCR: a modular structure for content interoperability between (all aspects) of lexical resources. ViCoS in LEXUS Accessible conceptual spaces

LEXUS - Lexicon tool Creation of lexica from scratch, import lexica from other formats

LEXUS - Lexicon tool Creation of lexica from scratch, import lexica from other formats User defined view of the information in the lexical entries

LEXUS - Lexicon tool Creation of lexica from scratch, import lexica from other formats User defined view of the information in the lexical entries Linking multi-media fragments to lexical entries

LEXUS - Lexicon tool Creation of lexica from scratch, import lexica from other formats User defined view of the information in the lexical entries Linking multi-media fragments to lexical entries Creation of links in images

LEXUS - Lexicon tool Link to: kauo e mei terminal bud (female)

LEXUS - Lexicon tool Creation of lexica from scratch, import lexica from other formats User defined view of the information in the lexical entries Linking multi-media fragments to lexical entries Creation of links in images Link to resources within the digital archive (or other external web-based resources) interaction with other archiving tools

LEXUS - further developments Towards a multi-media dictionary of the Marquesan and Tuamotuan languages of French polynesia Building a digital multi-media encyclopedic dictionary with LEXUS Improving basic LEXUS functionalities Conceptual spaces Improved User Interface Project team: Linguist team (Gablitz, Mosel) Developers (Kemps, Zinn, Alcock) Speech community (Kape, Guillome, Tetahiotupa, Tahia, Mataiki, Bruneau Pati)

LEXUS - further developments Towards a multi-media dictionary of the Marquesan and Tuamotuan languages of French polynesia Building a digital multi-media encyclopedic dictionary with LEXUS Improving basic LEXUS functionalities Conceptual spaces Improved User Interface Aim: Speech community input and extensions Community based instance of the lexicon

LEXUS - further developments Project workflow Joint action linguist and speech community Field work Lexicon creation Data archiving and annotation Definition of SW requirements Creation of MM lexicon all Developers Lexus basic functionalities Lexicon import Further developments of LEXUS

LEXUS - further developments Issues that came up: User Interface Conceptual spaces in multi media encyclopedia Collaborative workspaces

User Interface LEXUS - further developments User wants to enter the lexicon through the lexical entries, either by from the listed lexicon or by search :

LEXUS - further developments Conceptual spaces in multi media encyclopedia Conventional paper dictionaries: network of meanings less visible Paper dictionaries limited usefulness in language maintenance and language revival (Manning et al., 2000) Members of speech community prefer following semantic links of different semantic types (synonyms, antonyms, lexical, taxonomies)

LEXUS - further developments Conceptual spaces in multi media encyclopedia

ViCoS Vizualizing Conceptual Spaces Complement lexical spaces with ontological spaces Allow users to construct a space of culturally relevant concepts Concepts as centres for all sorts of information relations to other concepts anchored in the language to express them linked to multimedia archive to describe them

ViCoS

ViCoS Show ViCoS demo

Interdisciplinary software development challenges and problems Our challenge: Design a product that fits the needs of the SC and thus contribute to maintain and possible revitalize a documented language and consequently present and preserve the cultural heritage More practical: Simple user interface for a complex tool is it possible? Collaborative workspaces to work in a Wiki-like manner

Interdisciplinary software development challenges and problems So, what do we encounter: Interesting project and collaboration, but NOT easy: Need to bridge the concept gap Communication over distances Different expectations different (sub)-goals Software limitations of an online tool IPR between developer team and linguist team IPR between speech community and linguist team

Interdisciplinary software development challenges and problems Is there a positive conclusion? Interaction opens worlds First reactions on concept UI and ViCoS from SC are positive First experience of SC and LS is useful for the development of ViCoS More DoBeS projects are interested in using LEXUS as an exploitation tool We invite documentation teams to discuss their options in using LEXUS and ViCoS Acknowledgements: Thanks to Gaby Cablitz, Jean Kape, Guillome Taimana for their contributions