VP idioms in Norwegian: A subconstructional approach

Similar documents
Chapter 13, Sections Auxiliary Verbs CSLI Publications

English prepositional passive constructions

Outline of today s lecture

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Non-nominal Which-Relatives

University of Lisbon Department of Informatics. A Computational Grammar for Deep Linguistic Processing of Portuguese: LXGram, version A.4.

A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University)

A cloud-based editor for multilingual grammars

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

An HPSG analysis of a beautiful two weeks * 1

I have eaten. The plums that were in the ice box

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

High Precision Analysis of NPs with a Deep Processing Grammar

The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition

Scrambling in German - Extraction into the Mittelfeld

Syntactic Theory on Swedish

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Application of Natural Language Interface to a Machine Translation Problem

Implementing Norwegian Reflexives in an HPSG Grammar

Paraphrasing controlled English texts

Overview of MT techniques. Malek Boualem (FT)

Organizational Issues Arising from the Integration of the Lexicon and Concept Network in a Text Understanding System

Constraints in Phrase Structure Grammar

A Chart Parsing implementation in Answer Set Programming

Context Grammar and POS Tagging

Syntax: Phrases. 1. The phrase

Noam Chomsky: Aspects of the Theory of Syntax notes

Teaching Formal Methods for Computational Linguistics at Uppsala University

Translating while Parsing

Mixed Sentence Structure Problem: Double Verb Error

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

The Lexicon. The Lexicon. The Lexicon. The Significance of the Lexicon. Australian? The Significance of the Lexicon 澳 大 利 亚 奥 地 利

Francis Y. LIN Alex X. PENG (School of Foreign Languages and Literatures / Beijing Normal University)

-84- Svein Lie Institutt for nordisk språk og Universitetet i Oslo

How To Understand A Sentence In A Syntactic Analysis

The polysemy of lexeme formation rules

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Comprendium Translator System Overview

VP-TOPICALIZATION AND THE VERB GJØRE IN NORWEGIAN 1

Torben Thrane* The Grammars of Seem. Abstract. 1. Introduction

Understanding Clauses and How to Connect Them to Avoid Fragments, Comma Splices, and Fused Sentences A Grammar Help Handout by Abbie Potter Henry

Compositionality, of, word-formation

The treatment of noun phrase queries in a natural language database access system

6-1. Process Modeling

Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

Learning Translation Rules from Bilingual English Filipino Corpus

DIAGRAMMING SENTENCES

Understanding English Grammar: A Linguistic Introduction

Parts of Speech. Skills Team, University of Hull

A Writer s Reference, Seventh Edition Diana Hacker Nancy Sommers

AUTOMATIC F-STRUCTURE ANNOTATION FROM THE AP TREEBANK

A chart generator for the Dutch Alpino grammar

Swedish-English Verb Frame Divergences in a Bilingual Head-driven Phrase Structure Grammar for Machine Translation

Structure of Clauses. March 9, 2004

SAMPLE. Grammar, punctuation and spelling. Paper 2: short answer questions. English tests KEY STAGE LEVEL. Downloaded from satspapers.org.

Suppletion and stem dependency in inflectional morphology

Statistical Machine Translation

How To Understand And Understand A Negative In Bbg

Slot Grammars. Michael C. McCord. Computer Science Department University of Kentucky Lexington, Kentucky 40506

Natural Language Processing

Impossible Words? Jerry Fodor and Ernest Lepore

UNIVERSITÀ DEGLI STUDI DELL AQUILA CENTRO LINGUISTICO DI ATENEO

MARY. V NP NP Subject Formation WANT BILL S

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

EAP Grammar Competencies Levels 1 6

Structurally ambiguous sentences

Minimal Recursion Semantics: An Introduction

The Adjunct Network Marketing Model

A Preliminary Study of Comparative and Evaluative Questions for Business Intelligence

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Jean Véronis is Maître de Conférences at the Université de Provence and Researcher at the

An Interactive Hypertextual Environment for MT Training

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

English Descriptive Grammar

Detecting Parser Errors Using Web-based Semantic Filters

Natural Language Database Interface for the Community Based Monitoring System *

Database semantics for natural language

Characteristics of the METAL Machine Translation System at Production Stage. John S. White Siemens Communication Systems

Syntax-driven bindings of Spanish clitic pronouns

Annual Report H I G H E R E D U C AT I O N C O M M I S S I O N - PA K I S TA N

the primary emphasis on explanation in terms of factors outside the formal structure of language.

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Rethinking the relationship between transitive and intransitive verbs

Doctoral School of Historical Sciences Dr. Székely Gábor professor Program of Assyiriology Dr. Dezső Tamás habilitate docent

Special Topics in Computer Science

Interactive Dynamic Information Extraction

Index. 344 Grammar and Language Workbook, Grade 8

FUNDAMENTALS OF ARTIFICIAL INTELLIGENCE KNOWLEDGE REPRESENTATION AND NETWORKED SCHEMES

Lecture 9. Phrases: Subject/Predicate. English 3318: Studies in English Grammar. Dr. Svetlana Nuernberg

Phrasal Verbs and collocations

Proceedings of the LFG02 Conference. National Technical University of Athens, Athens. Miriam Butt and Tracy Holloway King (Editors) CSLI Publications

Off-line (and On-line) Text Analysis for Computational Lexicography

A Knowledge-based System for Translating FOL Formulas into NL Sentences

IP PATTERNS OF MOVEMENTS IN VSO TYPOLOGY: THE CASE OF ARABIC

Towards portable natural language interfaces to knowledge bases The case of the ORAKEL system

GRAMATICA INGLESA I GUÍA DOCENTE

Semantics and Generative Grammar. Quantificational DPs, Part 3: Covert Movement vs. Type Shifting 1

Transcription:

VP idioms in Norwegian: A subconstructional approach Petter Haugereid Department of Linguistic, Literary and Aesthetic Studies University of Bergen petter.haugereid@lle.uib.no Abstract This paper presents a brief overview of idiomatic expressions in the Norwegian LFG grammar NorGram and shows how the rich lexical information of the LFG grammar can be reused in an HPSG-like grammar with a radically different approach to alternating argument frames. Rather than accounting for idioms by means of special idiom lexical entries, which is the standard approach in LFG and HPSG, a constructional approach is taken where the verbs of the idioms are left underspecified with regard to whether they are idioms or not. A hierarchy of linking types is assumed, which for each piece of evidence provided by the words and rules of the sentence, narrows down the possible frames of the verb to just one. 1 Introduction The Norwegian LFG grammar NorGram (Dyvik, 2000) has 56 VP idioms in the lexicon, distributed over 20 templates. Abstracting away from whether the selected object of the idiom is definite or indefinite, and what kind of argument the selected preposition has (NP, subordinate clause or infinitival clause), we are left with four main kinds of idioms. 1 The first two kinds of idioms are semantically intransitive, hence they only take one argument, namely the subject. In the first kind of intransitive idioms the main verb selects an object, as 1 Three idioms (ta på kreftene ( tax one s strength ), sende ord ( send a message ), and komme på kant med ( fall out with )), do not fall into any of the four categories, and they are left out of the present discussion. shown in (1), and the second kind the main verb selects a PP, as shown in (2). (1) Han gikk konkurs. he went bankrupt He went bankrupt. (2) De løftet i flokk. They lifted in flock They worked together. The last two kinds of idioms are semantically transitive, hence they take two arguments. They differ in that in one kind the main verb selects an object and the preposition of a PP, see (3), while in the other the main verb selects a PP and takes an object as an argument, see (4). (3) Han la ikke skjul på sin glede. he laid not hiding on his joy He didn t hide his joy. (4) Han brakte temaet på bane. he brought topic.the on track He brought up the topic. A verb that is part of a VP idiom is assigned an idiom frame in the lexicon in addition to the other frames that it appears with. For example the verb bringe ( bring ) is listed with the following frame: @(VPIDIOM-PSELOBJ-OBJ bringe på bane) A lexical entry is allowed to have more than one argument frame by using disjunctions of frames. Disjunctions are expanded into full lexical entries during parsing. This means that a lexical entry with 6 disjunctive argument frames

is computationally equivalent to six lexical entries. In this paper I will present a new way of representing information about argument frames, including the different kinds of VP idioms presented in this section. The account shifts the burden from the lexicon to a carefully designed hierarchy of linking types. The transfer is achieved by means of phrasal subconstructions (see Haugereid and Morey (2012); Haugereid (2012)), which are construction parts that, when put together in a way that conforms with a constraint on the verb, form full constructions. The analysis is implemented in an HPSG-like grammar of Norwegian within the LKB system (Copestake, 2001). 2 Treatment of idioms in Sag et al. (2003) In (Sag et al., 2003, 347 355), idioms are assumed to have special lexical entries for the words that constitute them. The idiom keep tabs on is analyzed by means of a lexical entry for keep (see (5)) with three items on the subcat list; (i) the NP subject, (ii) an idiomatic noun tabs, and (iii) a constituent marked by the preposition on. (5) ptv-lxm STEM keep FORM on ARG-ST NP i, FORM tabs, INDEX j INDEX s RELN observe SEM SIT s RESTR OBSERVER i OBSERVED j As (5) shows, the relation of the idiom keep tabs on (observe) has two arguments, observer and observed, and they are linked to the subject of keep and the constituent marked by the preposition on. Both the idiomatic noun tabs and the selected preposition on are semantically empty. Given the degree of detail required in the lexicon, one is forced to assume separate lexical entries for idiomatic verbs. From a semantic point of view, this is motivated, considering how the meaning of idioms deviates from the compositional meaning. However, there is no morphological evidence indicating that idiomatic verbs should have separate lexical entries. They share the stem with their compositional versions and have the same inflections. 3 Analysis 3.1 Lexical representation In addition to the idiom frame given in (4), the verb bringe also has a transitive and a ditransitive frame, as shown in (6). (6) a. Han brakte maten. he brought food.the He brought the food. b. Han brakte henne maten. he brought her food.the He brought her the food. Even though we now have three argument frames for the verb bringe, we assume only one lexical entry, shown in (7). It has information about the stem of the lexeme, the head value, the head value of its (potential) arguments, c(onstruction)-arg1, c-arg2, c-arg3, and c-arg4, and its keyrel. The four argument features; c-arg1, c-arg2, c-arg3, and c-arg4 correspond to external subject, (deep) direct object, (deep) indirect object, and oblique object, respectively. Note that there is no linking of the c-args to the semantics. Rather, the linking is done in what we refer to as phrasal subconstructions. (7) bringe-v STEM < "bringe" > HEAD verb C-ARG1 HEAD noun C-ARG2 HEAD noun VAL C-ARG3 HEAD noun C-ARG4 HEAD compl-noun KEYREL 1 PRED bringe RELS 1

3.2 Phrasal subconstructions One example of a phrasal subconstruction is the rule that links (external) subjects, arg1-struct, illustrated in (8). In this rule, the value of c- arg1 link is switched from arg1 in the mother to arg1+ in the daughter. The c-arg features that are not mentioned in the representation, as well as the head and keyrel features are unified with those of the first daughter. 2 At the same time, the argument (the second daughter of the rule) is linked to the arg1 of the keyrel. (8) arg1-struct VAL KEYREL 1 ARG1 2 ARGS VAL C-ARG1 3 KEYREL 1 LINK arg1+ INDEX 2, 3 The grammar also has subconstructions that in the same fashion link (deep) direct objects arg2-struct, (deep) indirect objects arg3-struct, and oblique objects arg4-struct. The grammar has a rule vbl-struct which adds the verb. The verb is selected via the vbl feature, and the vbl value of the verb is transferred to the mother. The rule also unifies its keyrel value with that of the verb. (9) vbl-struct VBL 1 verb-word VBL 3 ARGS, 3VBL 1 3.3 Analysis of VP idioms The analysis of idioms includes phrasal subconstructions for selected prepositions prepsel-struct and two subconstructions for idiomatic nouns; arg2-idiom-struct and arg4-idiom-struct. The prepsel-struct subconstruction unifies the form value of the selected preposition with the c- arg4 link value, as shown in (10). The features head, val, and keyrel are unified with those of the first daughter (suppressed). 2 This information is suppressed in the representation in order to save space. (10) prepsel-struct ARGS V C-ARG4 LINK 8, prep-word FORM 8 The subconstructions arg2-idiom-struct and arg4-idiom-struct unifies the form value of the selected idiomatic noun with the respective link value of the first daughter. This is illustrated for arg2-idiom-struct in (11). The c-arg features not mentioned in the representations, as well as the head and keyrel features are unified with those of the first daughter. Note that arg2-idiom-struct and arg4-idiom-struct do not link the idiomatic noun to the keyrel in the way the arg1-struct in (8) does. In this way, the idiomatic nouns do not become semantic arguments. (11) arg2-idiom-struct V C-ARG2 LINK arg2 ARGS V C-ARG2 LINK 8, noun-word FORM 8 The link values have two functions. One is to keep track of arguments that are realized. In this way they function like empty/non-empty valence lists in HPSG. The other function is to narrow down what kind of construction the clause has. Each subconstruction that applies in a clause leaves a trace; either a link value is switched from negative to positive, or a selected item leaves a mark by unifying its form value with the respective link value. At the bottom of the tree, all this information is present. This is illustrated in the tree in Figure 1. 3 In the top node arg4-idiom-struct, all link values are negative, and at the bottom of the tree, in the START node, marks from all the subconstructions that have applied can be found. The start sign inherits from the type unilink, which unifies the link values of the arguments with the keyrel pred value (see (12)). 4 3 The motivation behind the left-branching design is given in Haugereid and Morey (2012). 4 The unification performed in uni-link is left out in the start sign in Figure 1, of expository reasons.

arg4-idiom-str C-ARG1 LINK C-ARG2 LINK C-ARG3 LINK C-ARG4 LINK arg1 arg2 arg3 arg4 prepsel-str noun-word FORM 4 bane C-ARG2 LINK arg2 C-ARG3 LINK arg3 bane C-ARG4 LINK 4 bane arg2-str prep-word FORM 4 på C-ARG2 LINK arg2 C-ARG3 LINK arg3 på C-ARG4 LINK 4 på*bane 2 arg1-str NP 9 C-ARG2 2 LINK arg2+ temaet C-ARG4 LINK 4 på*bane KEYREL 7 ARG2 9 vbl-str C-ARG1 1 LINK arg1+ C-ARG2 LINK arg2+ han C-ARG4 LINK 4 på*bane KEYREL 7 ARG1 8 1 NP 8 START verb-word C-ARG1 LINK arg1+ KEYREL 7 PRED C-ARG2 LINK arg2+ RELS 7 C-ARG4 LINK 4 på*bane PRED 0 bringe brakte KEYREL 7ARG1 8 ARG2 9 0 bringe Figure 1: Linking information in the idiom Brakte han temaet på bane? (Did he bring up the topic?) (12) uni-link C-ARG1 LINK 1 C-ARG2 LINK 1 VAL C-ARG3 LINK 1 C-ARG4 LINK 1 KEYREL PRED 1 the keyrel pred value, we get the type bringe*på*bane_rel. This follows from the hierarchy of linking types showed in Figure 2. The unification of the types arg1+, arg2+, arg3, på*bane, and bringe gives the type bringe*på*bane_rel. When the link values of the start sign in the tree in Figure 1 are unified with The hierarchy in Figure 2 also accounts for the ability of bringe to alternate between the

link arg3+ arg4 arg1+ arg2+ arg3 arg4+ på bane bringe på*bane bringe123 rel bringe12 rel bringe*på*bane rel Figure 2: hierarchy The position of bringe in the link type three argument frames shown in (4) and (6). Given the three subtypes of bringe in the linking type hierarchy, bringe12_rel, bringe123_rel, and bringe*på*bane_rel, the verb is allowed to enter the constellations of subconstructions that make up the regular transitive and ditransitive frames as well as the idiom frame. The MRS (Copestake et al., 2005) produced by the analysis in Figure 1 is given in Figure 13. 5 (13) LTOP INDEX RELS HCONS h1 { } e2 pres yes-no-question h3:pron_rel(x4) h5:pronoun_q_rel(x4,h6,h7) h8:bringe*på*bane_rel(e2,x4,x9) h10:tema_n_rel(x9) h11:def_q_rel(x9,h12,h13) { } h6 =q h3 h12 =q h10 The four kinds of idiomatic expression types introduced in Section 1 are accounted for by the following combinations of subconstructions: 1. Intransitive with idiomatic noun ((1)): vblstruct, arg1-struct, arg2-idiom-struct 2. Intransitive with idiomatic PP ((2)): vblstruct, arg1-struct, prepsel-struct, arg4- idiom-struct 3. Transitive with idiomatic noun ((3)): 5 The representation presupposes semantically empty versions of the preposition på and the noun bane as assumed in Sag et al. (2003). Alternatively, we could use the regular words. We would then avoid double lexical entries for these words. Their relations på_p_rel and bane_n_rel would then be given the same handle as the idiom relation bringe*på*bane_rel. vbl-struct, arg1-struct, arg2-idiom-struct, prepsel-struct, arg4-struct 4. Transitive with idiomatic PP ((4)): vblstruct, arg1-struct, arg2-struct, prepselstruct, arg4-idiom-struct 4 Discussion and future work The hierarchy of linking types that results from this account is huge, but finite. This kind of hierarchy is interesting in that it reflects what kinds of phrasal subconstructions are needed in order to express all grammaticalized concepts in a given grammar. The grammar presented in this paper, so far only has a small hand-built type hierarchy. The aim is to generate a full hierarchy from the lexicon of the LFG grammar NorGram. References Copestake, A. (2001). Implementing Typed Feature Structure Grammars. CSLI Lecture Notes. Center for the Study of Language and Information, Stanford. Copestake, A., Flickinger, D., Pollard, C. J., and Sag, I. A. (2005). Minimal recursion semantics: an introduction. Research on Language and Computation, 3(4), 281 332. Dyvik, H. (2000). Nødvendige noder i norsk: Grunntrekk i en leksikalsk-funksjonell beskrivelse av norsk syntaks. In Ø. Andersen, K. Fløttum, and T. Kinn, editors, Menneske, språk og felleskap. Novus forlag. Haugereid, P. (2012). A grammar design accommodating packed argument frame information on verbs. International Journal of Asian Language Processing, 22(3), 87 106. Haugereid, P. and Morey, M. (2012). A leftbranching grammar design for incremental parsing. In S. Müller, editor, Proceedings of the 19th International Conference on Head- Driven Phrase Structure Grammar, Chungnam National University Daejeon, pages 181 194. Sag, I. A., Wasow, T., and Bender, E. M. (2003). Syntactic Theory: A Formal Introduction. CSLI Publications, Stanford, 2 edition.