LINGUISTIC PROCESSING IN THE ATLAS PROJECT

Similar documents
A Virtual Interpreter for the Italian Sign Language

Annotation in Language Documentation

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

Natural Language to Relational Query by Using Parsing Compiler

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Special Topics in Computer Science

A Mixed Trigrams Approach for Context Sensitive Spell Checking

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

Evalita 09 Parsing Task: constituency parsers and the Penn format for Italian

TRANSLATING POLISH TEXTS INTO SIGN LANGUAGE IN THE TGT SYSTEM

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Hybrid Strategies. for better products and shorter time-to-market

Application of Natural Language Interface to a Machine Translation Problem

Social Media Text and Predictive Analytics

A Survey of ASL Tenses

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

Learning Translation Rules from Bilingual English Filipino Corpus

Customizing an English-Korean Machine Translation System for Patent Translation *

How to make Ontologies self-building from Wiki-Texts

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Using the BNC to create and develop educational materials and a website for learners of English

Processing: current projects and research at the IXA Group

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

M3039 MPEG 97/ January 1998

Paraphrasing controlled English texts

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

CHARACTERISTICS FOR STUDENTS WITH: LIMITED ENGLISH PROFICIENCY (LEP)

LANGUAGE! 4 th Edition, Levels A C, correlated to the South Carolina College and Career Readiness Standards, Grades 3 5

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Using NLP and Ontologies for Notary Document Management Systems

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Schema documentation for types1.2.xsd

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Computer Standards & Interfaces

Domain Specific Sign Language Animation for Virtual Characters

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

THE BACHELOR S DEGREE IN SPANISH

D2.4: Two trained semantic decoders for the Appointment Scheduling task

Package wordnet. January 6, 2016

An Overview of Applied Linguistics

Outline of today s lecture

A chart generator for the Dutch Alpino grammar

Compiler I: Syntax Analysis Human Thought

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Latin WordNet project

A System for Labeling Self-Repairs in Speech 1

Automated Extraction of Security Policies from Natural-Language Software Documents

Interactive Dynamic Information Extraction

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Distributed Database for Environmental Data Integration

INFORMING A INFORMATION DISCOVERY TOOL FOR USING GESTURE

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

Language and Computation

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

CSCI 3136 Principles of Programming Languages

Classification of Natural Language Interfaces to Databases based on the Architectures

Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Comprendium Translator System Overview

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Teaching Methodology for 3D Animation

Modern foreign languages

a Chinese-to-Spanish rule-based machine translation

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

Empirical Machine Translation and its Evaluation

Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology

Culture and Language. What We Say Influences What We Think, What We Feel and What We Believe

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Why major in linguistics (and what does a linguist do)?

An Introduction to Language Processing with Perl and Prolog

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transcription:

LINGUISTIC PROCESSING IN THE ATLAS PROJECT Leonardo LESMO, Alessandro Mazzei, Daniele Radicioni Interaction Models Group Dipartimento di Informatica e Centro di Scienze Cognitive Università di Torino {lesmo,mazzei,radicion}@di.unito.it

Written representation of LIS (Linguaggio Italiano dei Segni: Italian Sign Language) Automatic translation from Italian to LIS Creation of linguistic corpora Definition of a virtual character with proper expressions Transmission/visualization on several media,,, ATLAS : Scientific challenges

ATLAS Translation Pipeline Italian Text NL to AWLIS Translator AWLIS Text AWLIS to AL Translator AL Text Input texts wri-en in Italian will be translated into an intermediate representa4on, called ATLAS Wri-en Italian Sign Language (AWLIS)

ATLAS Translation Pipeline Italian Text NL to AWLIS Translator AWLIS Text AWLIS to AL Translator AL Text Sentences expressed in AWLIS are then translated into a character's gestures Anima4on Language (AL) which describes the way the basic movements are produced and linked.

ATLAS Translation Pipeline AL Text 3D Rendering Engine Screen Motion capture Keyframing Procedural Animation Techniques AL sequences will be rendered through a 3D rendering engine. An automa4c anima4on blending system will ensure smooth transi4ons between signs of the generated LIS anima4ons

SL Translation Issues Translating from a spoken language to a SL is a complex undertaking: The translation must transform a symbolic representation into another symbolic representation But SLs incorporate gestures, facial expressions, head movements, body language and even the space around the speaker So, we need a way to express in symbolic form the LIS sentence, i.e. a written LIS

Linguistic Analysis ITALIAN SENTENCE Dictionary Chunking Rules MORPHOLOGICAL ANALYSIS AND POS TAGGING Sequence of lexical items CHUNKING Verbs + Par4al Chunks COORDINATION ANALYSIS Verbs + Final Chunks Morphological Analysis PARSER Subcategorization frames Word meaning Ontology ANALYSIS OF VERBAL DEPENDENTS Parse tree SEMANTIC ANNOTATION Annotated parse tree ONTOLOGICAL INTERPRETATION SEMANTIC INTERPRETER Ontological representa4on Ontology-to-frames mapping TRANSLATION IN ATLAS FRAMES LIS Dictionary CCG Generator CCG LIS Grammar AWLIS

Example Locali addensamenti potranno interessare il settore nord-orientale (local clouding could affect the North-eastern sector) Result of the Morphologic-lexical analysis+ POS Tagging): 1 Locali (LOCALE ADJ QUALIF ALLVAL PL) 2 addensamenti (ADDENSAMENTO NOUN COMMON M PL ADDENSARE TRANS) 3 potranno (POTERE VERB MOD IND FUT INTRANS 3 PL) 4 interessare (INTERESSARE VERB MAIN INFINITE PRES TRANS) 5 il (IL ART DEF M SING) 6 settore (SETTORE NOUN COMMON M SING) 7 nord-orientale (NORD-ORIENTALE ADJ QUALIF ALLVAL SING) 8. (#\. PUNCT)

Example (2) Result of syntactic analysis 1 Locali (LOCALE ADJ QUALIF ALLVAL PL) [2;ADJC+QUALIF-RMOD] 2 addensamenti (ADDENSAMENTO NOUN COMMON M PL ADDENSARE TRANS) [3;VERB-SUBJ] 3 potranno (POTERE VERB MOD IND FUT INTRANS 3 PL) [0;TOP-VERB] 4 interessare (INTERESSARE VERB MAIN INFINITE PRES TRANS) [3;VERB+MODAL-INDCOMPL] 4.10 t [2f] (ADDENSAMENTO NOUN COMMON M PL ADDENSARE TRANS) [4;VERB-SUBJ] 5 il (IL ART DEF M SING) [4;VERB-OBJ] 6 settore (SETTORE NOUN COMMON M SING) [5;DET+DEF-ARG] 7 nord-orientale (NORD-ORIENTALE ADJ QUALIF ALLVAL SING) [6;ADJC+QUALIF-RMOD] 8. (#\. PUNCT) [3;END] adjc+qualif- rmod locale verb- subj addensamento potere verb+modal- indcompl interessare verb- subj verb- obj t il det+def- arg se-ore adjc+qualif- rmod nord- orientale

Semantics: a fragment of the ontology en4ty descrip4on geogr- part- selec4on- criterium geographic- area meteo- status- situa4on 4me- interval evaluable- en4ty it- region- group sea it- geogr- area day posi4ve- eval- en4ty evening it- region posi4ve- day posi4ve- evening it- cardinal- region it- island- region it- western- region it- eastern- region it- southern- region it- northern- region it- central- region it- adria4c- central- region it- adria4c- region

Example (3) Semantics: an ontological expression meteo-status-situation has-subclass (and ( weather-event has-subclass clouds subclass-of weather-event domain-of &has-event-width range weather-event-width (eq local-phenomenon) ( weather-status-situation domain-of &has-weather-location range geographic-area has-subclass it-geogr-area has-instance it-northeastern-area arg-of &has-it-area7 relinstance &has-it-area-spec range it-area-spec (eq northeastern)))

Example (4) Semantics: First Order Predicate Logic e (possible (e) clouds (e) &has-event-width (e, local-phenomenon) &has-weather-location (e, it-northeastern-area)) Notes 1) Modal logics:?; currently reification 2) The sequence it-northeastern-area arg-of &has-it-area7 relinstance &has-it-area-spec range it-area-spec (eq northeastern) specifies that, in this context, northeastern is a linguistic description (specification) of the involved area

Generation 1. Starting from an ATLAS frame we compute a semantic tree constituting an intermediate semantic representation, by using IF-THEN rules 2. Starting from the semantic tree we produce the sequence of LIS glosses, by using a CCG for LIS

Example (4) Per domani quindi nubi in aumento al nord. (ev 1 t 1 l 1 x 1 ) (cloud-increase(ev1) time(ev 1, t 1 ) deictic-descr(t 1, t 2 ) id (t 2,tomorrow) location(ev 1, l 1 ) id(l 1, nord) agent(ev 1, x 1 ) cloud(x 1 )) <?xml version="1.0" encoding="utf-8"?> <xml> <lf> <satop nom="c1:meteo-statussituation"> <prop name="cloud-increase"/> <diamond mode="agent"> <nom name="c2:clouds"/> <prop name="cloud"/> </diamond> <diamond mode="location"> <nom name="n1:it-northern-region"/> <prop name="north"/> </diamond> <diamond mode="time"> <nom name="t1:deictic-daydescription"/> <prop name="tomorrow"/> </diamond> </satop> </lf> </xml>

<?xml version="1.0" encoding="utf-8"?> <xml> <lf> <satop nom="c1:meteo-statussituation"> <prop name="cloud-increase"/> <diamond mode="agent"> <nom name="c2:clouds"/> <prop name="cloud"/> </diamond> <diamond mode="location"> <nom name="n1:it-northern-region"/ > <prop name="north"/> </diamond> <diamond mode="time"> <nom name="t1:deictic-daydescription"/> <prop name="tomorrow"/> </diamond> </satop> </lf> </xml> OpenCCG LIS-CCG domani_n nord_u nuvola_u nuvolaaumentare_u...

Example (5) Generation: Semantic Tree <ACTOR> be T <LOCATION> group north <ATTRIBUTE> <ATTRIBUTE> local east

Example (6) Generation: AWLIS INVECE NORDORIENTALE L ZONA L NUVOLE invece nordorientale zona nuvole both sx dx dx std std std std........................

A note on the dictionary: The ATLAS Sign set The set of signs used within the ATLAS project includes: Signs associated with Lemmas included in the Radutzky dictionary Signs widely used within the various Italian deaf communities, but not included in the Radutzky dictionary Signs newly defined within the ATLAS project, previously not existing, and needed to represent peculiar concepts, mainly linked with meteorological events and situations

From AWLIS to AEWLIS: Information Tracks A sentence is represented, in AEWLIS, resorting to a set on concurrent synchronized Tracks. The following 4 Tracks (TRs) are used: Lemma: Used to store information items related to the Lemmas, including, among the others, the lemma s ID, Identifier, its Part of Speech, and the set of Modifiers for the Lemma.

Information Tracks (2) Sentence Structure: Used to store information items related to the Structure of the Sentence, including, among the others, proper pointer to the related syntax tree, the sentence s syntactic and semantic roles, and its speech acts.

Channel Modifiers: Information Tracks (3) Its usage is twofold: To record the actual values of the following sign parameters: Sign Spatial Location, for Relocatable signs Action performed by the not-signing hand, when the sign requires just one hand. To record the differences on the various Communication Channels between the actual values of the performed sign and the default ones stored in the ATLAS MultiMedia Archive (AMMA)

Information Tracks (4) Time Stamps: Used to store info related to the Time Stamps associated with the sentence Time Slices

Example (7) <ACTOR> group <ATTRIBUTE> local be T <LOCATION> north <ATTRIBUTE> east (lemma de00 nuvola) (lemma de01 zona) (lemma de02 nordorientale) (lemma p00 in) (istopic de01) (into de00 de01) (rel de00 de01) (right ndom) (left dom) (free dom) (free ndom) (is_relocatable de00) (is_relocatable de01) (not_relocatable de02) (default_position de00 0 0 0) (default_position de01 0 0 0) (default_position de02 0 0 0) (position de01 1 1 0)

Example (6) invece nordorientale zona nuvole (lemma de00 nuvola) (lemma de01 zona) (lemma de02 nordorientale) (lemma p00 in) (istopic de01) (into de00 de01) (rel de00 de01) (right ndom) (left dom) (free dom) (free ndom) (is_relocatable de00) (is_relocatable de01) (not_relocatable de02) (default_position de00 0 0 0) (default_position de01 0 0 0) (default_position de02 0 0 0) (position de01 1 1 0) (!!handlocation both 0.0 0.0 0.0) (!!handanimation both invece) (!!handlocation both 0.0 0.0 0.0) (!!handanimation both nordorientale) (!!handlocation dom 1.0 1.0 0.0) (!!handanimation dom zona) (!!handlocation both 1.0 1.0 0.0) (!!handanimation both nuvola) (!removeentity (entity de00))

CONCLUSIONS Linguistic Analysis is a hard task It involves both grammatical knowledge and knowledge of the world (ontology) Though the problem is not solvable today in open-ended domains, good results can be obtained for restricted domains (as Weather Forecast) The adopted solutions can be extended to wider domains provided that the needed semantic knowledge is made available