Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning

Similar documents

Automatically Generated Grammar Exercises and Dialogs for Language Learning

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Outline of today s lecture

Syntax: Phrases. 1. The phrase

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Tools and resources for Tree Adjoining Grammars

Learning Translation Rules from Bilingual English Filipino Corpus

Paraphrasing controlled English texts

Natural Language Database Interface for the Community Based Monitoring System *

CALICO Journal, Volume 9 Number 1 9

Building a Question Classifier for a TREC-Style Question Answering System

Factoring Surface Syntactic Structures

Constraints in Phrase Structure Grammar

A Knowledge-based System for Translating FOL Formulas into NL Sentences

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

Correlation: ELLIS. English language Learning and Instruction System. and the TOEFL. Test Of English as a Foreign Language

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

written by Talk in French Learn French as a habit French Beginner Grammar in 30 days

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Slot Grammars. Michael C. McCord. Computer Science Department University of Kentucky Lexington, Kentucky 40506

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Index. 344 Grammar and Language Workbook, Grade 8

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

MARY. V NP NP Subject Formation WANT BILL S

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Syntactic Theory on Swedish

EAST PENNSBORO AREA COURSE: LFS 416 SCHOOL DISTRICT

FRENCH AS A SECOND LANGUAGE TRAINING

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Natural Language to Relational Query by Using Parsing Compiler

Mildly Context Sensitive Grammar Formalisms

CURRICULUM VITAE Studies Positions Distinctions Research interests Research projects

Noam Chomsky: Aspects of the Theory of Syntax notes

LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH

CHARTES D'ANGLAIS SOMMAIRE. CHARTE NIVEAU A1 Pages 2-4. CHARTE NIVEAU A2 Pages 5-7. CHARTE NIVEAU B1 Pages CHARTE NIVEAU B2 Pages 11-14

7th Grade - Middle School Beginning French

A Chart Parsing implementation in Answer Set Programming

Rethinking the relationship between transitive and intransitive verbs

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

LESSON THIRTEEN STRUCTURAL AMBIGUITY. Structural ambiguity is also referred to as syntactic ambiguity or grammatical ambiguity.

The Role of Sentence Structure in Recognizing Textual Entailment

Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

The parts of speech: the basic labels

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

XMG : extensible MetaGrammar

Chapter 13, Sections Auxiliary Verbs CSLI Publications

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

12 FIRST QUARTER. Class Assignments

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Statistical Machine Translation

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

GRAMATICA INGLESA I GUÍA DOCENTE

SYNTAX: THE ANALYSIS OF SENTENCE STRUCTURE

Optimizing Description Logic Subsumption

Lecture 9. Semantic Analysis Scoping and Symbol Table

ONLINE ENGLISH LANGUAGE RESOURCES

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

Brill s rule-based PoS tagger

UNIVERSITY COLLEGE LONDON EXAMINATION FOR INTERNAL STUDENTS

Context Grammar and POS Tagging

Interactive Dynamic Information Extraction

Lecture 9. Phrases: Subject/Predicate. English 3318: Studies in English Grammar. Dr. Svetlana Nuernberg

Advanced Grammar in Use

Livingston Public Schools Scope and Sequence K 6 Grammar and Mechanics

Une dame et un petit garçon regardent tranquillement le paysage.

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

A Writer s Reference, Seventh Edition Diana Hacker Nancy Sommers

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

English. Universidad Virtual. Curso de sensibilización a la PAEP (Prueba de Admisión a Estudios de Posgrado) Parts of Speech. Nouns.

UNIT ONE A WORLD OF WONDERS

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

Application of Natural Language Interface to a Machine Translation Problem

CS510 Software Engineering

Annotation Guidelines for Dutch-English Word Alignment

Non-nominal Which-Relatives

The Syntax of Predicate Logic

Semantic annotation of requirements for automatic UML class diagram generation

Structure of the talk. The semantics of event nominalisation. Event nominalisations and verbal arguments 2

SAMPLE. Grammar, punctuation and spelling. Paper 2: short answer questions. English tests KEY STAGE LEVEL. Downloaded from satspapers.org.

The New Forest Small School

Teaching English as a Foreign Language (TEFL) Certificate Programs

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

Inglés IV (B-2008) Prof. Argenis A. Zapata

Transcription:

Surface Realisation using Tree Adjoining Grammar. Application to Computer Aided Language Learning Claire Gardent CNRS / LORIA, Nancy, France (Joint work with Eric Kow and Laura Perez-Beltrachini) March 24, 2014

TAG-Based Surface realisation (SR)... maps data to text using a grammar which relates NL expressions, syntactic structures and meaning representations John loves Mary Parsing Grammar Generation Lexicon Algorithm l1:john(j), l2:mary(m), l3:love(e,j,m)

Outline 1. Grammars: TAG, FB-LTAG and Implementation 2. Algorithms for Surface Realisation 3. Application to Computer Aided Language Learning

Tree Adjoining Grammar A Tree Adjoining Grammar (TAG) is a tuple G = N,T,I,A,S such that T and N are terminals and nonterminals categories, I is a finite set of initial trees, and A is a finite set of auxiliary trees, S is a distinguished non terminal (the axiom) The trees in I A are called elementary trees. The trees of G are combined using adjunction and substitution. Each derivation yields two structures: a derived and a derivation tree.

Example Elementary Trees Initial trees are elementary trees whose leaves are labelled with non terminal or terminal categories. Leaf nodes labelled with non terminal are substitution nodes marked with Auxiliary trees are elementary trees with a designated foot node. The root and the foot nodes are labelled with the same category. NP S NP VP VP VP ADV John runs quickly

Example TAG Derivation Substitution inserts a derived or elementary tree at the substitution node of a TAG tree. Adjunction inserts an auxiliary tree into a tree (Adjunction is not allowed on substitution nodes)

Feature-Based TAG tree nodes are decored with two feature structures called top and bottom unifications on these feature structures are performed during derivation, each time a substitution or an adjunction takes place after derivation: at the end of the derivation, the top and bot FS of each node are unified

Using Features to capture Subject/Verb Agreement S NP [num:pl] VP NP [num:sg] V NP NP John love Mary

Unification based Semantic construction in FTAG Semantic representation language : unification-based flat underspecified formulae (aka MRS) Each elementary tree is associated with a formula φ representing its meaning Missing semantic arguments are represented by unification variables Elementary tree nodes are decorated with semantic indices occurring in φ The meaning of a derived tree is the union of the meanings associated with the elementary trees modulo the unifications made during processing

Capturing the Interplay between Syntax and Semantics S NP X VP NP j V NP Y NP m John loves Mary l j :name(j,john) l l :love(x,y) l m :name(m,mary) l l :love(j,m),l j :name(j,john),l m :name(m,mary)

Derived and Derivation Trees S α speak NP VP 1 2 Det NP idx=t num=sg gen=f VP V [ ] idx=e tse=pst ADV α tatoo 0 β the β loudly the tatoo speaks loudly (a) Derived tree (b) Derivation tree

Implementing a TAG A large coverage TAG consists of several thousands of trees For each word type, there are as many trees as there are different possible syntactic contexts for that word But these trees often share subtrees To implement a FB-LTAG, we use the XMG specification language and compiler As a result, each TAG tree is associated with the set of XMG classes used to produce it.

Structure Sharing in TAG NP S VP (Canonical Subject) S VP (Active Verb ) S NP VP Canonical Subject, Active Verb (e.g. the boy sleeps) NP NP* S NP (Relative Subject) VP S VP (Active Verb) NP* NP S NP VP Relative Subject, Active Verb (e.g. the boy who sleeps)

SemFRAG, a grammar for French An FB-LTAG for French with unification-based compositional semantics Roughly 6 000 elementary trees Coverage : 35 basic verbal subcategorisation frames Argument redistributions: active, passive, middle, neuter, reflexivisation, impersonal, passive impersonal Argument realisations: cliticisation, extraction, omission, permutations, etc.) Predicative (adjectival, nominal and prepositional) and light verb constructions Basic descriptions for adverbs, determiners and prepositions.

XXTAG, a grammar for English An FB-LTAG for English Roughly 1200 elementary trees Reimplementation of the XTAG grammar developed at UPenn Coverage : most of the Penn Treebank

Complexity Surface realisation from flat semantics is exponential in the length of the input (Brew92,Kay96) Unordered Input: At least 2 n possible combinations with n the number of literals in the input intersective modifiers: 2 m+1 possible intermediate structures with m the number of modifiers for a given structure lexical ambiguity: i=n i=1 Lex i possible combinations with Lex i the number of lexical entries associated with literal l i and n the number of literals in the input

SR Algorithm 1: GenI 3 steps: 1. Lexical selection: select trees whose semantics subsumes the input semantics. 2. Combination: perform substitutions or adjunctions between selected trees. 3. Extraction: Return the trees which are syntactically complete and whose semantics matches the input semantics. Claire Gardent and Eric Kow. A Symbolic Approach to Near-Deterministic Surface Realisation useing Tree Adjoining Grammar. ACL 2007, Prague

Example S NP X VP NP j V NP Y NP m John loves Mary l j :name(j,john) l l :love(x,y) l m :name(m,mary) l l :love(j,m),l j :name(j,john),l m :name(m,mary)

Optimisations Semantic indices used to limit the impact of unordered input Substitutions before Adjunctions to deal with intersective modifiers Polarity based filtering to reduce the initial search space and limit the impact of lexical ambiguity

Intersective modifiers 15 intermediate structures: fierce(x),little(x),cat(x),black(x) F,L,B,FL,FB,BL,BF,LB,LF,FLB,FBL,BLF,BFL,LBF,LFB multiplied by the context : x 2: the F,L,B,FL,FB,BL,BF,LB,LF,FLB,FBL,BLF,BFL,LBF,LFB x 2: the F,L,B,FL,FB,BL,BF,LB,LF,FLB,FBL,BLF,BFL,LBF,LFB runs 45 structures built

Substitutions before Adjunctions Adjunction restricted to syntactically complete trees The 2 m+1 intermediate structures are not multiplied out by the context : the cat runs the fierce cat runs, the black cat runs, the little cat runs, the fierce little cat runs, the fierce black cat runs, the black fierce cat runs,... 16 structures built

Polarity based filtering (Perrier 2003) Polarity based filtering filters out all combinations of lexical items which cannot result in a grammatical structure The grammar trees are associated with polarities reflecting their syntactic resources and requirements A combination of trees covering the input semantics but whose polarity is not zero is necessarily syntactically invalid and is therefore filtered out. A finite state automata is built which represent the possible choices (transitions) and the cumulative polarity (states) The paths leading to a state with polarity other than zero are deleted (automata minimisation)

Polarity Filtering Many combinations are syntactically incompatible. Polarity filtering aims to detect these combinations and to filter them out. john(j) drink(e,j,w) water(w) Polarity Count +1np S FIN -2np +1np +0np S INF -1np +1np S [mode:fin] S [mode:inf,controller:x] NP X VP NP X VP V NP Y V NP Y drinks drinks

SR Algorithm 2: RTGen Builds derivation rather than derived trees... using a conversion from FB-LTAG to Feature Based Regular Tree Grammar (FB-RTG, Schmitz and Leroux 2009) Earley algorithm with packing and sharing Claire Gardent, Benjamin Gottesman, and Laura Perez-Beltrachini. Using Regular Tree Grammars to enhance Sentence Realisation. Natural Language Engineeering, 2011.

Derived and Derivation Tree

Converting a TAG to an RTG

Earley Algorithm Axiom Goal [S S S, ] [S S S,φ] where φ is the input semantics. Prediction [A a(α B x β),ϕ] [σ(b 0 b( B 1,...,B n ),ψ)] with B b(b 1,...,B n ),ψ a grammar rule σ = mgu(b,b 0 ), P[x] ψ and ϕ ψ = Completion [A a(α B δ),ϕ][b b(β),ψ] [σ(a a(α(b,f) δ),ϕ ψ)] with σ = mgu(b,b 0 ), ϕ ψ =

RTGen vs GenI All trees are taken into account while filtering (GenI s polarity filtering ignores auxiliary trees) All features can be used (GenI s polarity filtering can only use ground features i.e., categories) All syntactic constraints are applied (not just counting) Intersective Modifiers are handled using packing and sharing

Comparison on two Automatically Generated Benchmarks Modifiers benchmark: modification differently distributed over the predicate argument structures + lexical ambiguity in modifiers. 1 789 input formulae. All benchmark: modification, varying number and type of verb arguments. 890 input formulae. Claire Gardent, Benjamin Gottesman, and Laura Perez-Beltrachini. Comparing the performance of two TAG-based surface realisers using controlled grammar traversal. COLING 2010: Posters, Beijing, China.

Generating Grammar Exercises Generate sentences Use the detailed linguistic information output by the generator to select and build exercises Three types of exercises: FIB, Shuffle and Reformulation C. Gardent and L. Perez-Beltrachini. Using FB-LTAG Derivation Trees to Generate Transformation-Based Grammar Exercices. TAG+11: The 11th International Workshop on Tree Adjoining Grammars and Related Formalisms, Paris, France, September 2012. L. Perez-Beltrachini, C. Gardent and G. Kruszewski Generating Grammar Exercices. The 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT Worskhop 2012, Montreal, Canada, June.

Grammar Exercises Built from a single sentence. [FIB] Complete with an appropriate personal pronoun. (S) Elle adore les petits tatous (She loves the small armadillos) (Q) adore les petits tatous (gender=fem) (K) elle [Shuffle] Use the words below to make up a sentence. (S) Tammy adore les petits tatous (Tammy loves the small armadillos) (Q) tatous / les / Tammy / petits / adore (K) Tammy adore les petits tatous.

Grammar Exercises Built from a pair of syntactically related sentences [Reformulation] Rewrite the sentence using passive voice (Q) C est Tex qui a fait la tarte. (It is Tex who has baked the pie.) (K) C est par Tex que la tarte a été faite. (It is Tex by whom the pie has been baked.) Active/Passive, NP/Pronoun, Assertion/Wh-Question, Assertion/YN-Question

The GramEx framework: generating and selecting sentences to build exercises

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be)

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux.

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux. Goal: Plural form of irregular nouns. Exercise type: Fill-in-the-blank.

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux. Goal: Plural form of irregular nouns. Exercise type: Fill-in-the-blank. 1. Select sentences NP[num = pl plural = irreg] CanonicalOrder

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux. S Goal: Plural form of irregular nouns. Exercise type: Fill-in-the-blank. 1. Select sentences NP[num = pl plural = irreg] CanonicalOrder NP VP Bette V NP [ ] num=pl aime D [ ] num=pl les NP [ ] lemma=bijou num=pl bijoux {CanonicalObject, CanonicalSubject, ActiveVerb}

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux. S NP VP Goal: Plural form of irregular nouns. Exercise type: Fill-in-the-blank. 1. Select sentences NP[num = pl plural = irreg] CanonicalOrder 2. Process the selected sentence NP[num = pl] blank NP[lemma = bijou] hint Bette V NP [ ] num=pl aime D [ ] num=pl les NP [ ] lemma=bijou num=pl bijoux {CanonicalObject, CanonicalSubject, ActiveVerb}

Creating a grammar exercise aime(e,be,bi),bijou(bi),les(bi),betty(be) Bette aime le bijou. C est Bette qui aime les bijoux. Bette aime les bijoux. S NP VP Goal: Plural form of irregular nouns. Exercise type: Fill-in-the-blank. 1. Select sentences NP[num = pl plural = irreg] CanonicalOrder 2. Process the selected sentence NP[num = pl] blank NP[lemma = bijou] hint Bette V NP [ ] num=pl (Q) Bette aime les (K) bijoux. (bijou) aime D [ ] num=pl les NP [ ] lemma=bijou num=pl bijoux {CanonicalObject, CanonicalSubject, ActiveVerb}

Selecting appropriate sentences GramEx s boolean constraint language over linguistic constants S morpho-syntactic (feature structures) syntactic (tree properties) NP VP Tammy V [ ] tense=pst mode=ind {CanSubj,CanObj,Active} NP [ ] num=sg a D [ ] num=sg NP [ ] num=sg une NP [ ] num=sg gen=f voix [ ] flexion=irreg Adj{Epith} douce

Selecting appropriate sentences GramEx s boolean constraint language: syntax and use Boolean constraint language conjunction, disjunction and negation of morpho-syntactic and syntactic properties Describes the linguistic requirements imposed by pedagogical goals Permits retrieving appropriate sentences from the DB

Selecting appropriate sentences Some examples Pedagogical goal: Pre/post nominal irregular adjectives [ Epith flexion: irreg ] Tammy a une voix douce (Tammy has a soft voice) X Tammy a une jolie voix (Tammy has a nice voice) Pedagogical goal: Prepositions with infinitives POBJinf CLAUSE POBJinf (DE-OBJinf A-OBJinf) CLAUSE Vfin Mod CCoord Sub Tammy refuse de chanter (Tammy refuses to sing) X Jean dit que Tammy refuse de chanter (John says that Tammy refuses to sing)

Transformation-based grammar exercices Finding syntactically related sentences (e.g. active/passive) (Q) C est Tex qui a fait la tarte. (It is Tex who baked the pie.) X (K) Tex a fait la tarte. (Tex baked the pie.) X (K) La tarte a été faite par Tex. (The pie was baked by Tex.) X (K) C est par Tex que la tarte sera faite. (It is Tex who will bake the pie.) X (K) Est-ce que la tarte a été faite par Tex? (Has the pie been baked by Tex?) (K) C est par Tex que la tarte a été faite. (It is Tex by whom the pie was baked.)

Creating transformation-based grammar exercises To identify pairs of sentences that are identical up to a single syntactic transformation, we... Use the information contained in derivation trees Define tree filters on pairs of derivation trees Retrieve sentences pairs that match those tree filters

Why Derivation Trees? α-faire:{active,cleftsubj,canobj} (num:sg,tse:pst,mode:ind,pers:3) α-tex:{propernoun} (fun:subj,gen:fem,num:sg,pers:3) α-tarte:{noun} (fun:obj,gen:fem,num:sg) β-la:{defdet} (gen:fem,num:sg) Detailed syntactic information More compact than derived trees. Allow fewer and simpler filters.

Derivation vs Derived trees S VP N S Cl V N VP V N V V D N c est Tex qui a fait la tarte S VP PP S Cl V Prep N C VP N V D N V V V V c est par Tex que la tarte a été fait α-faire:{active,cleftsubj,canobj} α-faire:{passive,cleftagent,cansubj} α-tex:{ } α-avoir:{ } α-tarte:{ } α-tex:{ } α-avoir:{ } α-tarte:{ } β-la:{ } β-la:{ }

Derivation Tree Filters Tree filter types α {Ps} v α {Pt} v e.g. active/passive s{active,cleftsubj,canobj} t{passive,cleftagent,cansubj} α {Ps} v α {Pt} v e.g. NP/Pronoun s{cansubj} t{cliticsubj} α NP α pron... α v α v β qm e.g. Assertion/YN-Question q{questionmark}

Meaning Preserving Transformations Same core meaning (e.g. active/passive) (Q) C est Tex qui a fait la tarte. (K) C est par Tex que la tarte a été faite. (It is Tex who has baked the pie) (It is by Tex that the pie has been baked) (K) La tarte a été faite par Tex. (The pie has been baked by Tex) α-faire:{active,cleftsubj,canobj} α-tex:{ } β-avoir:{ } α-tarte:{ } α-faire:{passive,cleftagent,cansubj} α-tex:{ } β-avoir:{ } α-tarte:{ } β-la:{ } α-faire:{passive,agent,cansubj} β-la:{ } α-tex:{ } β-avoir:{ } α-tarte:{ } β-la:{ } s{active,cleftsubj,canobj} t{passive,cleftagent,cansubj}

Meaning Altering Transformations Related core meaning: content deleted, added or replaced (e.g. Assertion/Wh-Question) α-dort:{ CanSubj } α-dort:{ whsubj } α-dort:{ whsubj } α-dort:{ whsubj } α-tatou:{... } α-tatou:{... } α-tatou:{... } β-qui:{ WhPron } β-chante:{... } β-petit:{... } β-le:{defdet} Le petit tatou qui chantera dort. The small armadillo that will sing sleeps β-petit:{... } β-quel:{ WhDet } Quel petit tatou dort? Which small armadillo sleeps? β-quel:{ WhDet } Quel tatou dort? Which armadillo sleeps? Qui dort? Who sleeps?

Correctness, Productivity, Integration Manual annotation of a sample of generated exercises using SemFraG and lexicon tailored to Tex s French Grammar vocabulary around 80% of the automatically generated exercises are correct 52 input formulae around 5000 exercises Exercises generated by GramEx are integrated in I-FLEG (serious game) and WFLEG (web interface)

WFLEG

WFLEG

WFLEG

Thanks!