The polysemy of lexeme formation rules



Similar documents
Suppletion and stem dependency in inflectional morphology

CURRICULUM VITAE Studies Positions Distinctions Research interests Research projects

LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier. Université Paris-Est Marne-la-Vallée

Mahesh Srinivasan. Assistant Professor of Psychology and Cognitive Science University of California, Berkeley

ANALYSIS OF FRENCH AND HUNGARIAN HIGHER EDUCATION TERMINOLOGY

Aspectual Properties of Deverbal Nouns

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Statistical Machine Translation

TS3: an Improved Version of the Bilingual Concordancer TransSearch

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Annotation Guidelines for Dutch-English Word Alignment

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Learning Translation Rules from Bilingual English Filipino Corpus

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Marie Dupuch, Frédérique Segond, André Bittar, Luca Dini, Lina Soualmia, Stefan Darmoni, Quentin Gicquel, Marie-Hélène Metzger

A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University)

Morphology-Semantics Interface Lexicology-Lexicography Language Teaching (Learning Strategies, Syllabi)

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

Linguistic Analysis of Requirements of a Space Project and Their Conformity with the Recommendations Proposed by a Controlled Natural Language

ONLINE TRANSLATION SERVICES FOR THE LAO LANGUAGE

Brill s rule-based PoS tagger

Lexical Competition: Round in English and Dutch

Parole sintagmatiche in italiano

Jean Véronis is Maître de Conférences at the Université de Provence and Researcher at the

Hybrid Strategies. for better products and shorter time-to-market

June 2016 Language and cultural workshops In-between session workshops à la carte June weeks All levels

Factoring Surface Syntactic Structures

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

SEPTEMBER Unit 1 Page Learning Goals 1 Short a 2 b 3-5 blends 6-7 c as in cat 8-11 t p

Morphological Analysis and Named Entity Recognition for your Lucene / Solr Search Applications

Comprendium Translator System Overview

Title - Sujet Online Caregiver Training Program. Solicitation No. - N de l'invitation /A. Client Reference No. - N de référence du client

Assessments; Optional module 1 vert assessments. Own test on the perfect tense

Selected Topics in Applied Machine Learning: An integrating view on data analysis and learning algorithms

French 2A. Course Overview

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

THE PRESENT PROGRESSIVE IN FRENCH

Key Vocabulary for Year 6 French

Plugin SMILK. données liées et traitement de la langue pour plus d'intelligence dans la navigation sur le Web

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

An HPSG analysis of a beautiful two weeks * 1

Extraction of Hypernymy Information from Text

A Chart Parsing implementation in Answer Set Programming

Les pays du monde. Unit 9

1 Basic concepts. 1.1 What is morphology?

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Raconte-moi : Les deux petites souris

Trameur: A Framework for Annotated Text Corpora Exploration

GSLT Course - spring NLP1 Practical - WORDS Building a morphological analyzer. Preben Wik preben@speech.kth.se

Clustering Connectionist and Statistical Language Processing

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Complex Predications in Argument Structure Alternations

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Computer-aided Document Indexing System

Title - Sujet Door Control System Procure & Insta. Solicitation No. - N de l'invitation /A

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?

Amazigh ConCorde: an appropriate concordance for Amazigh

Cassandra. References:

NOUNS. SINGULAR NOUNS The Definite Article = the

Computer Aided Document Indexing System

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Il est repris ci-dessous sans aucune complétude - quelques éléments de cet article, dont il est fait des citations (texte entre guillemets).

Processing: current projects and research at the IXA Group

SPRING SCHOOL. Empirical methods in Usage-Based Linguistics

Morphology. Morphology is the study of word formation, of the structure of words. 1. some words can be divided into parts which still have meaning

CONSTRAINING THE GRAMMAR OF APS AND ADVPS. TIBOR LACZKÓ & GYÖRGY RÁKOSI rakosi, laczko

Chapter 13, Sections Auxiliary Verbs CSLI Publications

Une dame et un petit garçon regardent tranquillement le paysage.

Collateral Feature Discharge

Overview of MT techniques. Malek Boualem (FT)

First Step en Francais

Global Supply Chain Summit 2015

Special Topics in Computer Science

UNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH

ANDRÉ BOURCIER 35, Tigereye Crescent Whitehorse (Yukon) Y1A 6G9 (867)

Elective in the major X General Education** **AFTER the new course is approved, a separate proposal must be sent to the General Education Committee.

Title - Sujet ONE WAY OPTICAL DIODE. Solicitation No. - N de l'invitation W /A. Client Reference No. - N de référence du client

Gaynes School French Scheme YEAR 8: weeks 8-13

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Open issues regarding legal metadata: IP licensing and management of different cognitive levels

Title - Sujet A&E-Carleton Martello Tower Repairs. Solicitation No. - N de l'invitation EC /A

How do absolute beginners learn inflectional morphology? (French learners of Polish)

B5 Polysemy in a Conceptual System

Advanced Testing Techniques

Travaux publics et Services gouvernementaux Canada. Title - Sujet Province House, Techinical/Design

Travaux publics et Services gouvernementaux Canada. Title - Sujet ATIP SOLUTION. Solicitation No. - N de l'invitation HT /A

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)

D où vient le plomb? :

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Teaching terms: a corpus-based approach to terminology in ESP classes

Semantic analysis of text and speech

Online free translation services

Travaux publics et Services gouvernementaux Canada. Title - Sujet court reporting services. Solicitation No. - N de l'invitation /A

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

209 THE STRUCTURE AND USE OF ENGLISH.

The Lexicon. The Lexicon. The Lexicon. The Significance of the Lexicon. Australian? The Significance of the Lexicon 澳 大 利 亚 奥 地 利

10 mistakes not to make in France!

Evaluation of speech technologies

Transcription:

The polysemy of lexeme formation rules Delphine Tribout & Olivier Bonami Workshop Semantics of derivational morphology: Empirical evidence and theoretical modeling July 01, 2014

Introduction We address the issue of polysemy of lexeme formation processes in French. The polysemy of particular morphological processes has been studied extensively in the last decade for French. We propose a formal analysis of empirical studies on polysemy We use the HPSG framework We work in the theoretical framework of lexemic morphology (Aronoff, 1994; Fradin, 2003). 2/23

Polysemous Lexeme Formation Rules Deverbal LFRs often produce different types of meaning. Example : -age suffixation rule type example gloss event N LAVAGE cleaning act of cleaning instrument N AIGUILLAGE railroad switch what one uses to switch railroad location N GARAGE parking lot place where one parks Output lexemes are often ambiguous between two types types examples gloss event N CIRAGE polishing act of polishing instrument N CIRAGE shoe polish what one uses to polish event N PASSAGE passing act of going through location N PASSAGE path location through which one goes 3/23

Polysemous Lexeme Formation Rules This situation can lead to posit polysemous LFRs: a single rule with multiple semantics. PHON φ az PHON φ CAT noun age-lfr : CAT verb action of Ving, SEM V SEM place where one Vs, object used to V A polysemous rule gives rise to a single polysemous lexeme PHON sik(e) PHON sikaz CAT verb CAT noun { } SEM polish SEM polishing, shoe polish 4/23

Semantics of derived lexemes Derived lexemes are not polysemous (1) (2) (3) However, when polysemous lexemes are used, a single interpretation is selected J ai acheté du cirage noir. I have bought PART shoe_polish black I bought black polish Le cirage de mes bottes m a pris dix minutes The polishing of my boots me has taken ten minutes It took me ten minutes to shine my boots *Grâce à ce cirage noir, celui de mes bottes sera vite fait. Thanks to this shoe_polish black that of my boots will_be quickly made (int.) Thanks to this black polish, polishing my boots will be quick Problem: if there is a single polysemous lexeme CIRAGE with multiple meanings, why is anaphora impossible in (3)? 5/23

Semantics of derived lexemes Derived lexemes are not polysemous Polysemous meaning can be represented as a disjunction of meanings J ai acheté du cirage noir [ shoe polish polishing ] I bought black shoe polish Le cirage de mes bottes prend 10 min. polishing my boots takes 10 min [ shoe polish polishing ] Anaphora does not behave as predicted by polysemous meaning *Grâce à ce cirage noir, [ shoe polish polishing ] celui de mes bottes sera vite fait. Instead, specific meaning makes the right prediction *Grâce à ce cirage noir, celui de mes bottes sera vite fait. [ shoe polish ] 6/23

Semantics of derived lexemes Derived lexemes are not polysemous This is not an effect of lexicalization: the same observation holds in the case of nonce formations. example with POMPONNAGE < POMPONNER make up (4) (5) Marie a acheté du pomponnage bleu. Mary has bought PART bleu Mary bought blue makeup Le pomponnage de Marie prend 15 minutes tous les matins. DET of Mary takes 15 minutes every DET mornings It takes Mary 15 minutes to make herself up every mornings. (6) *Grâce à ce pomponnage bleu, celui de Marie est vite fait. Thanks to this bleu that of Mary is quickly made (int.) Thanks to this blue makeup, Mary makes herself up quickly 7/23

Semantics of derived lexemes Conclusion Derived lexemes have specific meanings Thus LFRs need to output multiple specific lexemes, rather than one single lexeme with multiple meanings We could postulate multiple specific -age rules PHON age-evt-lfr : CAT SEM PHON age-instr-lfr : CAT SEM PHON age-loc-lfr : CAT SEM φ PHON verb CAT V SEM φ az noun action of Ving φ PHON φ az verb CAT noun V SEM object used to V φ PHON verb CAT V SEM φ az noun loc where one Vs 8/23

Semantics of derived lexemes Conclusion But then new problems arise: 1. How do we account for the similarities among the multiple -age rules? 2. How do we avoid redundancy between rules? 3. How do we account for the shared properties of homophonous derived lexemes, like CIRAGE evt and CIRAGE instr? We use the HPSG framework and its multiple inheritance hierarchy to model LFRs and to capture both multiple specific rules and shared properties among these rules 9/23

Lexeme formation and the multiple inheritance Since Flickinger (1987), there has been a tradition of using inheritance hierarchies to capture some aspects of the structure of the lexicon This idea has been extended to account for productive lexeme formation ((Riehemann, 1998),(Koenig, 1999)): Lexicalized derived lexemes are leaf nodes in the hierarchy LFRs are treated as schematic lexical entries for derived lexemes, where the base is not specified. Fruitfully applied to French LFRs ((Bonami and Boyé, 2006),(Desmets and Villoing, 2009),(Tribout, 2010)) We use a variant of this setup 10/23

Lexeme formation and the multiple inheritance A polysemous LFR can be treated as a collection of specific semantic rules sharing (inheriting) a form schema age-lfr PHON CAT SEM lfr φ PHON verb CAT σ SEM φ [az] noun τ [ age-evt-lfr ] [ ] V action of Ving [ age-instr-lfr ] [ ] V obj. used to V [ age-loc-lfr ] [ ] V loc where to V LAVAGE washing CIRAGE shoe polish GARAGE garage 11/23

Lexeme formation and the multiple inheritance Likewise, a semantic operation shared by different morphological processes can be abstracted away as a rule schema instr-lfr PHON CAT SEM lfr φ PHON ψ verb CAT noun V SEM object used to V [ age-instr-lfr ] [ ] φ φ az [ oir-instr-lfr ] [ ] φ φ wak [ conv-instr-lfr ] [ ] φ φ CIRAGE shoe polish HACHOIR mincer RÉVEIL alarm clock 12/23

Lexeme formation and the multiple inheritance Multiple inheritance hierarchies Both types of rule schemas can be combined verb-to-noun-lfr SYNSEM MORPHOPHON instr-fr loc-fr evt-fr oir-fr conv-fr age-fr oir-instr oir-loc conv-instr conv-loc conv-evt age-instr age-loc age-evt HACHOIR mincer PARLOIR parlour RÉVEIL DÉCHARGE alarm clock dump MARCHE walk CIRAGE shoe polish GARAGE garage LAVAGE washing 13/23

Lexeme formation and the multiple inheritance Conclusion Using multiple inheritance hierarchies allows us to account for the polysemy of LFRs. it accounts for similarities among the different cases of -age derivation it avoids redundancy among rules However we also need to account for the shared properties of homophonous derived lexemes, like CIRAGE event and CIRAGE instr We use the paradigm identifiers worked out by Bonami (2011) 14/23

Paradigm Identifiers According to Bonami (2011) each lexeme has a Paradigm IDentifier that is shared between multiple lexemes with the same paradigm CIRAGE evt and CIRAGE instr inflect in the same way they have the same Paradigm IDentifier PID PHON CAT SEM cirage [sikaz] noun σ PIDs are complex data structures driving inflection They capture (Fradin and Kerleroux, 2003) s notion of a flexeme: a family of lexemes with the same inflectional paradigm. But they avoid postulating semantically underspecified superlexemes 15/23

Paradigm Identifiers All LFRs modify the PID of their input. This is stipulated at the MORPHOPHON level age-lfr PID PHON CAT SEM p PID φ verb PHON CAT σ SEM age(p) φ [az] noun τ [ age-evt-lfr ] [ ] V action of Ving [ age-instr-lfr ] [ ] V obj. used to V [ age-loc-lfr ] [ ] V loc where to V All sub-types of age-lfr inherit the PID, and the PID is kept constant by operations that do not affect the inflectional paradigm (e.g. lexicalization, semantic shift) 16/23

Productivity The multiple inheritance hierarchy allows us to represent every semantic types of output given a morphological process. However, all semantic types are not productive in the same way for a morphological process. We also want to account for the productivity of every semantic types of output 17/23

Productivity LFRs can be tought of as abstractions of what exists in the lexicon We can have an idea of the productivity of a semantic type by looking at how many lexemes it describes Koehl (2012) has shown with Fr. -itude suffixation that a not productive process can always be used and become productive again unlike Baayen (2001) we do not take productivity as the capacity to form hapaxes rather, we only look at the frequency of lexemes described by a semantic type in a corpus we take token frequency, rather than type frequency, into account we provide an illustration with deverbal -eur suffixation in French 18/23

Productivity Example: deverbal -eur suffixation in French There has been numerous studies on deverbal -eur suffixation in French (among which (Benveniste, 1975), (Fradin and Kerleroux, 2003), (Rosenberg, 2008), (Roy and Soare, 2012)) -eur suffixation forms agent and instrument nouns, but there also are few result nouns We used the French corpus Lexique 3 (http://www.lexique.org/) it contains 55 000 lexemes and 135 000 inflected forms for each inflected form it provides category, phonology, etc. and its frequency in two corpora (a sub-part of Frantext and movie subtitles). 1 591 deverbal nouns extracted from the lexicon manually annotated as agent, instrument or result 19/23

Productivity Example: deverbal -eur suffixation in French eur-lfr PID PHON CAT SEM p PID φ verb PHON CAT σ SEM eur(p) φ [œk] noun τ [ eur-agt-lfr ] [ ] [ eur-instr-lfr ] [ ] V the one who Vs V obj. used to V [ eur-res-lfr ] [ ] V result of Ving 4 423.73 297.9 192.78 DANSEUR dancer CHANTEUR singer MINUTEUR timer EFFACEUR eraseur SUEUR sweat SENTEUR scent 20/23

Conclusion We have argued that LFRs have monosemic outputs We use multiple inheritance hierarchy in order to model LFRs it allows us to capture multiple specific rules, and to avoid redundancy among them We use Paradigm IDentifiers in order to account for properties shared by homophonous derived lexemes We include token frequency into the description in order to account for productivity of each semantic type of output within the same morphological process 21/23

Conclusion Thank you for your attention 22/23

References Aronoff, M. (1994). Morphology by Itself. The MIT Press, Cambridge. Baayen, H. (2001). Word frequency distribution, volume 18 of Text, Speech and Language Technology. Kluwer Academic Publishers, Dordrecht. Benveniste, E. (1975). Noms d agent et noms d action en indo-européen. Maisonneuve, Paris. Bonami, O. (2011). Reconstructing HPSG morphology. In 18th International Conference on HPSG, Seattle. Bonami, O. and Boyé, G. (2006). Deriving inflectional irregularity. In Müller, S., editor, Proceedings of the HPSG 06 Conference, pages 361 380, Stanford. CSLI publications. Desmets, M. and Villoing, F. (2009). French VN lexemes: morphological compounding in HPSG. In Proceedings of the HPSG 09 Conference, pages 89 109. Flickinger, D. (1987). Lexical Rules in the Hierarchical Lexicon. PhD thesis, Stanford University. Fradin, B. (2003). Nouvelles approches en morphologie. Puf, Paris. Fradin, B. and Kerleroux, F. (2003). Troubles with lexemes. In Booij, G., de Cesaris, J., Scalise, S., and Ralli, A., editors, Topics in Morphology. Selected papers from the Third Mediterranean Morphology Meeting, pages 177 196, Barcelona. ULA-Universitat Pompeu Fabra. Koehl, A. (2012). La construction morphologique des noms désadjectivaux suffixés en français. PhD thesis, Nancy 2. Koenig, J.-P. (1999). Lexical Relations. CSLI Publications, Stanford. Riehemann, S. (1998). Type-based derivational morphology. Journal of Comparative Germanic Linguistics, 2:49 77. Rosenberg, M. (2008). La formation agentive en français. Les composés [VN/A/Adv/P] N/A et les dérivés V-ant, V-eur et V-oir(e). PhD thesis, Stockholm University. Roy, I. and Soare, E. (2012). L enquêteur, le surveillant et le détenu: les noms déverbaux de participants aux événements, lecture événementielle et structure argumentale. Lexique, 20:207 231. Tribout, D. (2010). How many conversions from verb to noun are there in French? In Müller, S., editor, Proceedings of the 17th International Conference on HPSG, pages 341 357, Stanford. CSLI Publications. 23/23