LINKING VERB PATTERN DICTIONARIES OF ENGLISH AND SPANISH

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "LINKING VERB PATTERN DICTIONARIES OF ENGLISH AND SPANISH"

Transcription

1 LINKING VERB PATTERN DICTIONARIES OF ENGLISH AND SPANISH VÍT BAISA ~ SARA MOŽE ~ IRENE RENAU Masaryk University Brno, Czech Republic University of Wolverhampton United Kingdom Pontificia Universidad Católica de Valparaíso, Chile

2 INTRODUCTION Verbs are complex AIM: methodology and tools for the creation of a multilingual corpus-driven lexical resource for verbs using manual and automatic procedures CPA-based monolingual pattern dictionaries What are they? New multilingual resource researchers and language professionals? Preliminary study: I. Manual linking task gold standard dataset II. Automatic linking task = algorithm; evaluated against the gold standard

3 CORPUS PATTERN ANALYSIS (CPA) Corpus Pattern Analysis (CPA) (Hanks, 2004) an empirical technique in Corpus Ling. and Lexicography map word meaning onto word use through lexical analysis of phraseological patterns, collocations Basis: Theory of Norms and Exploitations (TNE) (Hanks, 2013) double helix patterns of normal usage ( norms ) vs. their exploitations Pattern semantically motivated syntagmatic pattern Syntax: SPOCA (Halliday) Semantics: typical nominal slot fillers, represented by Semantic Types (ST) mnemonic sem. labels CPA shallow ontology (Hanks and Ježek, 2010) approx. 250 STs; shared by several projects

4 PDEV: harvest WHAT IS A PATTERN?

5 CPA PATTERN DICTIONARIES Pattern Dictionary of Italian Verbs (PDIV) Elisabetta Ježek, Pavia Pattern Dictionary of English Verbs (PDEV) Public website: Prof. Hanks, University of Wolverhampton; over 1,700+ English verbs completed Procedure: corpus samples (250/500/1000 lines) from the BNC corpus (Leech, 1992); Sketch Engine word sketches (Kilgarrif et al., 2014), CPA Editor (Baisa et al., 2015) and CPA shallow ontology (Ježek and Hanks, 2010) Implicatures; register, domain, idiom/phrasal verb labels; links to FrameNet (Ruppenhofer et al., 2010) Percentages for each pattern Pattern Dictionary of Spanish Verbs (PDSV) Public website: Verbario: Irene Renau, Pontificia Universidad Católica de Valparaíso 300 high-frequency Spanish verbs (currently only 100 publicly available online) Same methodology (CPA), guidelines, ontology, tools (SkE); but: Spanish Web Corpus

6 MANUAL LINKING: SP-EN PATTERN PAIRS Gold standard: 87 SP verbs with one or more EN equivalents (total: 126 EN verbs) Medium-frequency verbs, up to 15 patterns Manual cross-linguistic links between pattern pairs semanto-syntactic similarity = tertium comparationis linking procedure developed dataset used in algorithm evaluation Issues practical, theoretical Coverage: PDEV/PDSV are WIP resources; different coverage; limited overlap!!! Zero equivalence: cultural, social, cognitive, pragmatic reasons; idioms

7 INPUT: POTENTIALLY MATCHING EN PATTERN Does it have the same basic syntactic structure as the SP pattern (i.e. SVO or SV [+no obj])? YES NO Do all semantic types in all obligatory syntactic slots match? E.g.: EN: [[Human]] admire [[Anything]] SP: [[Human]] admirar [[Anything]] YES NO OUTPUT: PERFECT MATCH Do the two patterns share at least ONE semantic type in the same obligatory syntactic slot? For example: EN: [[Eventuality 1 Human Institution]] occasion [[Eventuality 2]] SP: [[Eventuality 1]] motivar [[Eventuality 2]] YES NO Are the two semantic types in the same obligatory syntactic position related to each other in terms of inheritance in the CPA ontology (up to two nodes), e.g. [[Eventuality]] (supertype) vs. [[Activity]] and [[Plan]] (subtypes): EN: [[Eventuality 1 Human]] spoil [[Eventuality 2]] SP: [[Eventuality Human]] estropear [[Activity Plan]] YES NO OUTPUT: PARTIAL MATCH OUTPUT: NO MATCH

8 AUTOMATIC PATTERN LINKING: ALGORITHM Heuristic-based algorithm: automatic linking suggestions Similarity score 490 SP patterns and their translations into EN (statistical EN-SP dictionary <-- parallel corpus) S, DO, IO comparison of STs Full match: 1 score pt (*Human = 0.5 pt); matching empty slots (e.g. DO) 0.5 pts CPA ontology: similarity score = 0.5 N Score calculated based on the distance (N) in the CPA ontology tree Scores summed up, final score assigned to the pair, top ranking EN pattern = most likely candidate Evaluation 50 SP-EN verb pairs Excluded: SP pattern cannot be matched agains an EN pattern in the sample Final no. of candidate pattern pairs: 50 gold standard 40/50 suggested candidate pairs were correct 80% precision

9 CONCLUSION Future activities: Gold standard: more annotated data; Refine the linking procedure (fine-grained distinctions?; intralingual links) Algorithm: train, improve precision; Software adaptation: feature for adding cross-linguistic links to the dictionaries/databases.

10 REFERENCES Baisa, V., El Maarouf, I., Rychlý, P. & Rambousek, A. (2015). Software and data for Corpus Pattern Analysis. In Horák, A., Rychlý, P., and Rambousek, A. (eds.), Ninth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno. Tribun EU Buyse, K. and Verlinde, S. (2013). Possible effects of free on line data driven lexicographic instruments on foreign language learning: The case of Linguee and the interactive language toolbox. In Procedia: Social and Behavioral Sciences, volume 95, pages Elsevier BV. Fillmore, C. J. and Baker, C. (2010). A frames approach to semantic analysis. The Oxford Handbook of Linguistic Analysis, pages Halliday, M. A. K. (1994). An introduction to Functional Grammar. Edward Arnold. Hanks, P. (2004). Corpus Pattern Analysis. In G. Williams & S. Vessier (Eds.), 11 th Euralex International Congress. Proceedings. Lorient: Université de Bretagne-Sud, pp Hanks, P. (2013). Lexical Analysis: Norms and Exploitations. Cambridge, MA: MIT Press. Hlavácková, D., Horák, A. (2005). Verbalex new comprehensive lexicon of verb valencies for Czech. In Proceedings of the Slovko Conference. Ježek, E., & Hanks, P. (2010) What lexical sets tell us about conceptual categories. Lexis: E-journal in English lexicology. 4: Corpus Linguistics and the Lexicon. Université Lumiere, Lyon Ježek, E., Magnini, B., Feltracco, A., Bianchini, A., and Popescu, O. (2014). T-pas: A resource of corpusderived types predicateargument structures for linguistic analysis and semantic processing. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 14), pages Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P. & Suchomel, V. (2014). The Sketch Engine: ten years on. Lexicography 1(1): Leech, G. (1992) 100 million words of English: the British National Corpus (BNC). Language Research 28(1):1 13. Maarouf, I. E., Bradbury, J., and Hanks, P. (2014). PDEVlemon: a Linked Data implementation of the Pattern Dictionary of English Verbs based on the Lemon model. In Proceedings of the 3rd Workshop on Linked Data in Linguistics (LDL): Multilingual Knowledge Resources and Natural Language Processing at the Ninth International Conference on Language Resources and Evaluation (LREC 14), Reykjavik, Iceland. Navigli, R. & Ponzetto, S. (2012). BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence 193: Nazar, R. & Renau, I. (2015). Ontology Population Using Corpus Statistics. In O. Papini, S. Benferhat, L. Garcia et al. (Eds.), Proceedings of the Joint Ontology Workshops 2015 colocated with the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015). Buenos Aires, Argentina, July 25-27, Ruppenhofer, J., Ellsworth, M., Petruck, M. R., Johnson, C. R. & Scheffczyk, J. (2010). FrameNet II: Extended Theory and Practice. Berkeley, CA: ICSI. Vossen, P. (2002). WordNet, EuroWordNet and Global WordNet. Revue Française de Linguistique Appliquée 7(1): Yong, H. & Peng, J. (1997). Bilingual lexicography from a communicative perspective. Amsterdam: John Benjamins.

11 Pattern Dictionary of English Verbs USEFUL LINKS VERBARIO (Pattern Dictionary of Spanish Verbs) PDEV-LEMON

THE END OF MEANING-DRIVEN DICTIONARIES?

THE END OF MEANING-DRIVEN DICTIONARIES? THE END OF MEANING-DRIVEN DICTIONARIES? Anca Cehan, Prof., PhD Alexandru Ioan Cuza University of Iași Abstract: The paper examines the evolution of the meaning-driven dictionaries based on the Hornby model,

More information

Bilingual Word Sketches: the translate Button

Bilingual Word Sketches: the translate Button Bilingual Word Sketches: the translate Button Vít Baisa, Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý Lexical Computing Ltd, UK; Faculty of Informatics, Masaryk University, Czech Republic

More information

The Global WordNet Grid Software Design

The Global WordNet Grid Software Design The Global WordNet Grid Software Design Aleš Horák, Karel Pala, and Adam Rambousek Faculty of Informatics Masaryk University Botanická 68a, 602 00 Brno Czech Republic {hales,pala,xrambous}@fi.muni.cz Abstract.

More information

Finding Multiwords of More Than Two Words 1

Finding Multiwords of More Than Two Words 1 Finding Multiwords of More Than Two Words 1 Adam Kilgarriff, Pavel Rychlý, Vojtěch Kovář & Vít Baisa Lexical Computing Ltd., Brighton, United Kingdom NLP Centre, Faculty of Informatics, Masaryk University,

More information

Tickbox Lexicography. Adam Kilgarriff 1, Vojtěch Kovář 2, Pavel Rychlý 3

Tickbox Lexicography. Adam Kilgarriff 1, Vojtěch Kovář 2, Pavel Rychlý 3 Tickbox Lexicography Adam Kilgarriff 1, Vojtěch Kovář 2, Pavel Rychlý 3 Lexical Computing Ltd, Masaryk University Abstract Corpus lexicography involves, first, an analysis of a word, and then, copying

More information

SemEval-2015 Task 15: A Corpus Pattern Analysis Dictionary-Entry-Building Task

SemEval-2015 Task 15: A Corpus Pattern Analysis Dictionary-Entry-Building Task SemEval-2015 Task 15: A Corpus Pattern Analysis Dictionary-Entry-Building Task Vít Baisa Masaryk University xbaisa@fi.muni.cz Ismaïl El Maarouf University of Wolverhampton i.el-maarouf@wlv.ac.uk Jane Bradbury

More information

Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine

Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine Adam Kilgarriff adam@lexmasterclass.com Lexical Computing Ltd., Brighton, UK The Sketch Engine is a leading corpus

More information

Improving interaction with the user in Cross-Language Question Answering through Relevant Domains and Syntactic Semantic Patterns

Improving interaction with the user in Cross-Language Question Answering through Relevant Domains and Syntactic Semantic Patterns Improving interaction with the user in Cross-Language Question Answering through Relevant Domains and Syntactic Semantic Patterns Borja Navarro, Lorenza Moreno, Sonia Vázquez, Fernando Llopis, Andrés Montoyo,

More information

LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES

LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES LEXICOGRAMMATICAL PATTERNS OF LITHUANIAN PHRASES Rūta Marcinkevičienė, Gintarė Grigonytė Vytautas Magnus University, Kaunas, Lithuania Abstract The paper overviews the process of compilation of the first

More information

Translating Action Verbs using a Dictionary of Images: the IMAGACT Ontology

Translating Action Verbs using a Dictionary of Images: the IMAGACT Ontology Translating Action Verbs using a Dictionary of Images: the IMAGACT Ontology Alessandro Panunzi, Irene De Felice*, Lorenzo Gregori, Stefano Jacoviello, Monica Monachini*, Massimo Moneglia, Valeria Quochi*,

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

A Project for the Construction of an Italian Lexical Knowledge Base in the Framework of WordNet

A Project for the Construction of an Italian Lexical Knowledge Base in the Framework of WordNet A Project for the Construction of an Italian Lexical Knowledge Base in the Framework of WordNet Bernardo Magnini, Carlo Strapparava Fabio Ciravegna and Emanuele Pianta IRST, Istituto per la Ricerca Scientifica

More information

Structure Mapping for Jeopardy! Clues

Structure Mapping for Jeopardy! Clues Structure Mapping for Jeopardy! Clues J. William Murdock murdockj@us.ibm.com IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 Abstract. The Jeopardy! television quiz show asks natural-language

More information

. 1998a. "The Scopal Basis of Adverb Licensing," 1998 Annual Meeting of LSA, January, b. "Scope Based Adjunct Licensing," NELS 28, pp.

. 1998a. The Scopal Basis of Adverb Licensing, 1998 Annual Meeting of LSA, January, b. Scope Based Adjunct Licensing, NELS 28, pp. References Alexiadou, A. 1997. Adverb Placement ---------- A Case Study in Antisymmetric Syntax. Amsterdam: John Benjamins. Austin, J. L. 1956. "A plea for excuses," in Urmason & Warnock (eds.) Philosophical

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

Introduction: WHY, WHAT and HOW in phraseology

Introduction: WHY, WHAT and HOW in phraseology 6.1 (2005): 1-5 1 0 lanci Articles - Artikel Marija Omazi Faculty of Philosophy Josip Juraj Strossmayer University Osijek Introduction: WHY, WHAT and HOW in phraseology In the past 20 years there has been

More information

Simple maths for keywords

Simple maths for keywords Simple maths for keywords Adam Kilgarriff Lexical Computing Ltd adam@lexmasterclass.com Abstract We present a simple method for identifying keywords of one corpus vs. another. There is no one-sizefits-all

More information

Students workload: ECTS: 4; 1h lectures + 1h seminar (30 h) + 60 h of preparation and seminar papers

Students workload: ECTS: 4; 1h lectures + 1h seminar (30 h) + 60 h of preparation and seminar papers University of Zadar English Department COURSE SYLLABUS Course: Semantics Year: 3 rd Semester: 5 th Course prerequisites: Introduction to the Study of English Language and Linguistics Lecturer: Jadranka

More information

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle

More information

The Oxford Learner s Dictionary of Academic English

The Oxford Learner s Dictionary of Academic English ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students

More information

What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project

What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Proceedings of elex 2011, pp. 203-208 What Makes a Good Online Dictionary? Empirical Insights from an Interdisciplinary Research Project Carolin Müller-Spitzer, Alexander Koplenig, Antje Töpel Institute

More information

Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition Key Words:

Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition Key Words: Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition Key Words: Lexical field academic proficiency syntactic repertoire context lexical categories

More information

Search Result Diversification Methods to Assist Lexicographers

Search Result Diversification Methods to Assist Lexicographers Search Result Diversification Methods to Assist Lexicographers Lars Borin Markus Forsberg Karin Friberg Heppin Richard Johansson Annika Kjellandsson Språkbanken, Department of Swedish, University of Gothenburg

More information

Capturing Syntactico-semantic Regularities among Terms: An Application of the FrameNet Methodology to Terminology

Capturing Syntactico-semantic Regularities among Terms: An Application of the FrameNet Methodology to Terminology Capturing Syntactico-semantic Regularities among Terms: An Application of the FrameNet Methodology to Terminology Marie-Claude L Homme, Janine Pimentel Observatoire de linguistique Sens-Texte (OLST) Université

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries

Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi

More information

Supporting FrameNet Project with Semantic Web technologies

Supporting FrameNet Project with Semantic Web technologies Supporting FrameNet Project with Semantic Web technologies Paulo Hauck 1, Regina Braga 1, Fernanda Campos 1, Tiago Torrent 2, Ely Matos 2, José Maria N. David 1 1 Pós Graduação em Ciência da Computação

More information

ASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES

ASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES ASSOCIATING COLLOCATIONS WITH DICTIONARY SENSES Abhilash Inumella Adam Kilgarriff Vojtěch Kovář IIIT Hyderabad, India Lexical Computing Ltd., UK Masaryk Uni., Brno, Cz abhilashi@students.iiit.ac.in adam@lexmasterclass.com

More information

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)

Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

The validity of the Linguistic Fingerprint in forensic investigation Rebecca Crankshaw (Linguistics)

The validity of the Linguistic Fingerprint in forensic investigation Rebecca Crankshaw (Linguistics) The validity of the Linguistic Fingerprint in forensic investigation Rebecca Crankshaw (Linguistics) Abstract This article is concerned with the definition and validity of a linguistic fingerprint, a relatively

More information

Czech Verbs of Communication and the Extraction of their Frames

Czech Verbs of Communication and the Extraction of their Frames Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar Institute of Formal and Applied Linguistics ÚFAL MFF UK, Malostranské náměstí 25, 11800 Praha, Czech Republic

More information

Using DEB Services for Knowledge Representation within the KYOTO Project

Using DEB Services for Knowledge Representation within the KYOTO Project Using DEB Services for Knowledge Representation within the KYOTO Project Aleš Horák and Adam Rambousek Faculty of Informatics, Masaryk University Botanická 68a, 602 00 Brno, Czech Republic {hales,xrambous}@fi.muni.cz

More information

Meaning-Text-Theory and Lexical Frames

Meaning-Text-Theory and Lexical Frames Meaning-Text-Theory and Lexical Frames Bob Coyne Columbia University New York, NY, USA coyne@cs.columbia.edu Owen Rambow Columbia University New York, NY, USA rambow@ccls.columbia.edu Abstract We discuss

More information

DiCE in the web: An online Spanish collocation dictionary

DiCE in the web: An online Spanish collocation dictionary GRANGER, S.; PAQUOT, M. (EDS.). 2010. ELEXICOGRAPHY IN THE 21ST CENTURY: NEW CHALLENGES, NEW APPLICATIONS. PROCEEDINGS OF ELEX2009, LOUVAIN-LA-NEUVE, 22-24 OCTOBER 2009. CAHIERS DU CENTAL 7. LOUVAIN-LA-NEUVE,

More information

What Do We Teach: Applied Linguistics or Language Teaching Methodology?

What Do We Teach: Applied Linguistics or Language Teaching Methodology? Theory and Practice in English Studies 3 (2005): Proceedings from the Eighth Conference of British, American and Canadian Studies. Brno: Masarykova univerzita What Do We Teach: Applied Linguistics or Language

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Word Polarity Detection Using a Multilingual Approach

Word Polarity Detection Using a Multilingual Approach Word Polarity Detection Using a Multilingual Approach Cüneyd Murad Özsert and Arzucan Özgür Department of Computer Engineering, Boğaziçi University, Bebek, 34342 İstanbul, Turkey muradozsert@gmail.com,

More information

THE SOLUTION OF MT LINGUISTIC PROBLEMS THROUGH LEXICOGRAPHY Erwin Reifler University of Washington

THE SOLUTION OF MT LINGUISTIC PROBLEMS THROUGH LEXICOGRAPHY Erwin Reifler University of Washington [Proceedings of the National Symposium on Machine Translation, UCLA February 1960] Session 7: THE SOLUTION OF MT LINGUISTIC PROBLEMS THROUGH LEXICOGRAPHY Erwin Reifler University of Washington I believe

More information

A Distributed Database System for Developing Ontological and Lexical Resources in Harmony

A Distributed Database System for Developing Ontological and Lexical Resources in Harmony A Distributed Database System for Developing Ontological and Lexical Resources in Harmony Aleš Horák 1, Piek Vossen 2, and Adam Rambousek 1 1 Faculty of Informatics Masaryk University Botanická 68a, 602

More information

14 Automatic language correction

14 Automatic language correction 14 Automatic language correction IA161 Advanced Techniques of Natural Language Processing J. Švec NLP Centre, FI MU, Brno December 21, 2015 J. Švec IA161 Advanced NLP 14 Automatic language correction 1

More information

of VerbNet against PropBank and Section 5 shows examples of preposition mismatches between the two resources. 2 VerbNet's components VerbNet is an on-

of VerbNet against PropBank and Section 5 shows examples of preposition mismatches between the two resources. 2 VerbNet's components VerbNet is an on- Using prepositions to extend a verb lexicon Karin Kipper, Benjamin Snyder, Martha Palmer University of Pennsylvania 200 South 33rd Street Philadelphia, PA 19104 USA fkipper,bsnyder3,mpalmerg@linc.cis.upenn.edu

More information

Translation of verbal idioms

Translation of verbal idioms Translation of verbal idioms Martine Smets, Joseph Pentheroudakis and Arul Menezes Microsoft Research One Microsoft Way Redmond, WA 98052, USA martines@microsoft.com josephp@microsoft.com arulm@microsoft.com

More information

Sorting out the Most Confusing English Phrasal Verbs

Sorting out the Most Confusing English Phrasal Verbs STARSEM-2012 Sorting out the Most Confusing English Phrasal Verbs Yuancheng Tu Department of Linguistics University of Illinois ytu@illinois.edu Dan Roth Department of Computer Science University of Illinois

More information

The Interplay between the Speaker s and the Hearer s Perspective Petra Hendriks, Helen de Hoop & Henriëtte de Swart

The Interplay between the Speaker s and the Hearer s Perspective Petra Hendriks, Helen de Hoop & Henriëtte de Swart The Interplay between the Speaker s and the Hearer s Perspective Petra Hendriks, Helen de Hoop & Henriëtte de Swart Suppose you witnessed the killing of Harry by Frank and wish to report on that. To express

More information

Link Type Based Pre-Cluster Pair Model for Coreference Resolution

Link Type Based Pre-Cluster Pair Model for Coreference Resolution Link Type Based Pre-Cluster Pair Model for Coreference Resolution Yang Song, Houfeng Wang and Jing Jiang Key Laboratory of Computational Linguistics (Peking University) Ministry of Education,China School

More information

Multi-Functional Software for Electronic Dictionaries

Multi-Functional Software for Electronic Dictionaries ELECTRONIC DICTIONARIES IN SECOND LANGUAGE COMPREHENSION Multi-Functional Software for Electronic Dictionaries Hiroaki SATO, Senshu University, Kanagawa, Japan Abstract I have been developing a computer

More information

A New Machine Translation System English to Portuguese Using NooJ

A New Machine Translation System English to Portuguese Using NooJ A New Machine Translation System English to Portuguese Using NooJ Anabela Barreiro anabela.barreiro@nyu.edu Universidade do Porto & Linguateca New York University Presentation Outline Structure 1. Introduction

More information

The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation

The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation Overview Profile-based Msm Build-up Synonyms First results The Sem metrix Project: Scaling up the Profile-Based Measurement of Lexical Variation Kris Heylen & Yves Peirsman KULeuven Quantitative Lexicology

More information

Syntax II: Issues in Syntax Spring Semester 2013

Syntax II: Issues in Syntax Spring Semester 2013 Syntax II: Issues in Syntax Spring Semester 2013 ENGL 627S / LING 522 T-TH 1:30-2:45pm, Heav 110 Instructor Dr. Elaine Francis Email: ejfranci@purdue.edu Office: Heav 408 Office hours: Tues-Thurs 9:45-10:15am

More information

2014/02/13 Sphinx Lunch

2014/02/13 Sphinx Lunch 2014/02/13 Sphinx Lunch Best Student Paper Award @ 2013 IEEE Workshop on Automatic Speech Recognition and Understanding Dec. 9-12, 2013 Unsupervised Induction and Filling of Semantic Slot for Spoken Dialogue

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

DEVELOPMENT AND ANALYSIS OF HINDI-URDU PARALLEL CORPUS

DEVELOPMENT AND ANALYSIS OF HINDI-URDU PARALLEL CORPUS DEVELOPMENT AND ANALYSIS OF HINDI-URDU PARALLEL CORPUS Mandeep Kaur GNE, Ludhiana Ludhiana, India Navdeep Kaur SLIET, Longowal Sangrur, India Abstract This paper deals with Development and Analysis of

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Multi-source hybrid Question Answering system

Multi-source hybrid Question Answering system Multi-source hybrid Question Answering system Seonyeong Park, Hyosup Shim, Sangdo Han, Byungsoo Kim, Gary Geunbae Lee Pohang University of Science and Technology, Pohang, Republic of Korea {sypark322,

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Semi-automatically Alignment of Predicates between Speech and OntoNotes Data

Semi-automatically Alignment of Predicates between Speech and OntoNotes Data Semi-automatically Alignment of Predicates between Speech and OntoNotes Data Niraj Shrestha, Marie-Francine Moens Department of Computer Science, KU Leuven, Belgium {niraj.shrestha, Marie-Francine.Moens}@cs.kuleuven.be

More information

Sorting out translation universals from specific source-language interference

Sorting out translation universals from specific source-language interference Sorting out translation universals from specific source-language interference The case of phrasal verbs in translated English Rudy Loock and Bert Cappelle University of Lille 3 & National Center for Scientific

More information

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval

Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk

More information

Name of the Course: M.A. in Translation: Theory and Application

Name of the Course: M.A. in Translation: Theory and Application Name of the Course: M.A. in Translation: Theory and Application Aims and Objectives India is a multi-lingual Country. It has also inherited a very old culture and cultural records. It is, therefore, imperative

More information

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:

More information

Applying semantic frame theory to automate natural language template generation from ontology statements

Applying semantic frame theory to automate natural language template generation from ontology statements Applying semantic frame theory to automate natural language template generation from ontology statements Dana Dannélls NLP research unit, Department of Swedish Language University of Gothenburg, SE-405

More information

George Mikros, Villy Tsakona, Maria Drakopoulou, Alexandra Koutra, Evangelia Triantafylli and Sofia Trypanagnostopoulou University of Athens

George Mikros, Villy Tsakona, Maria Drakopoulou, Alexandra Koutra, Evangelia Triantafylli and Sofia Trypanagnostopoulou University of Athens 1 DEVELOPING AN ENGLISH-GREEK COMPARABLE CORPUS USING WEB TEXTS George Mikros, Villy Tsakona, Maria Drakopoulou, Alexandra Koutra, Evangelia Triantafylli and Sofia Trypanagnostopoulou University of Athens

More information

PROMT Technologies for Translation and Big Data

PROMT Technologies for Translation and Big Data PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED

More information

Translation Corpora: Annotation, Exploitation, Evaluation

Translation Corpora: Annotation, Exploitation, Evaluation 1 Introduction Parallel corpora, i.e. collections of originals and their translations, can be used in various ways for the benefit of translation studies, machine translation, linguistics, computational

More information

Adaptation in a Language Learning System

Adaptation in a Language Learning System Adaptation in a Language Learning System Johann Gamper Free University of Bozen/Bolzano johann.gamper@unibz.it Judith Knapp European Academy of Bozen judith.knapp@eurac.edu Abstract In this paper we present

More information

EFL Learners Synonymous Errors: A Case Study of Glad and Happy

EFL Learners Synonymous Errors: A Case Study of Glad and Happy ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 1, No. 1, pp. 1-7, January 2010 Manufactured in Finland. doi:10.4304/jltr.1.1.1-7 EFL Learners Synonymous Errors: A Case Study of Glad and

More information

VERBS OF MOTION AND SENTENCE PRODUCTION IN SECOND LANGUAGE

VERBS OF MOTION AND SENTENCE PRODUCTION IN SECOND LANGUAGE VERBS OF MOTION AND SENTENCE PRODUCTION IN SECOND LANGUAGE Stanislava Antonijević & Sarah Berthaud School of Health Sciences, National University of Ireland, Galway The current study examines production

More information

2 nd Semester: Tuesday: :00; 13:30-14:00 Thursday: 9:45-10:30 Friday: 9:45-10:30; 13:30-15:00

2 nd Semester: Tuesday: :00; 13:30-14:00 Thursday: 9:45-10:30 Friday: 9:45-10:30; 13:30-15:00 COURSE INFORMATION GRAMMATICAL THEORIES OF THE ENGLISH LANGUAGE Degree in English Philology Academic Year 2011-12 Optional course, 5 th year Annual course: 3 hours a week, 2 days a week. 9 credits TEACHING

More information

Describing and Explaining Grammar and Vocabulary in ELT, Dilin Liu. Routledge, New York (2014). xxii pp., ISBN: (pbk).

Describing and Explaining Grammar and Vocabulary in ELT, Dilin Liu. Routledge, New York (2014). xxii pp., ISBN: (pbk). Iranian Journal of Language Teaching Research 3(2), (July, 2015) 123-127 123 Content list available at www.urmia.ac.ir/ijltr Iranian Journal of Language Teaching Research (Book Review) Urmia University

More information

Terminology standardization, terminology management and best practices. Or... what's going on with terminology? Kara Warburton

Terminology standardization, terminology management and best practices. Or... what's going on with terminology? Kara Warburton Terminology standardization, terminology management and best practices Or... what's going on with terminology? Kara Warburton kara@termologic.com Agenda Introspection What is terminology? terminology is

More information

Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources

Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources Spandana Gella, Carlo Strapparava, Vivi Nastase University of Melbourne, Australia FBK-irst,

More information

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

More information

Varieties of lexical variation

Varieties of lexical variation Dirk Geeraerts University of Leuven Varieties of lexical Abstract This paper presents the theoretical backgr ound of a large-scale lexicological research project on lexical that was carried out at the

More information

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information

More information

Building the MetaNet metaphor repository: The natural symbiosis of metaphor analysis and construction grammar

Building the MetaNet metaphor repository: The natural symbiosis of metaphor analysis and construction grammar Building the MetaNet metaphor repository: The natural symbiosis of metaphor analysis and construction grammar Oana David, oanadavid@berkeley.edu Elise Stickles, elstickles@berkeley.edu Ellen Dodge, edodge@icsi.berkeley.edu

More information

WSD Using English-Spanish Aligned Phrases over Comparable Corpora

WSD Using English-Spanish Aligned Phrases over Comparable Corpora WSD Using English-Spanish Aligned Phrases over Comparable Corpora David Fernández-Amorós Departamento de Lenguajes y Sistemas Informáticos UNED, Madrid david@lsi.uned.es Abstract In this paper we describe

More information

Jornada de Seguimiento de Proyectos, 2004 Programa Nacional de Tecnologías Informáticas

Jornada de Seguimiento de Proyectos, 2004 Programa Nacional de Tecnologías Informáticas Jornada de Seguimiento de Proyectos, 2004 Programa Nacional de Tecnologías Informáticas Automatic processing of textual information in Spanish (Tratamiento automático de la información textual en español:

More information

Terminology, Phraseology, and Lexicography

Terminology, Phraseology, and Lexicography Terminology, Phraseology, and Lexicography Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague patrick.w.hanks@gmail.com Abstract This paper explores two aspects of

More information

A Swedish Academic Word List: Methods and Data

A Swedish Academic Word List: Methods and Data A Swedish Academic Word List: Methods and Data Håkan Jansson, Sofie Johansson Kokkinakis, Judy Ribeck & Emma Sköldberg Keywords: Swedish language, language learning and teaching, academic vocabulary, corpus-based

More information

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA

More information

DALOS - DrAfting Legislation with Ontology-based Support

DALOS - DrAfting Legislation with Ontology-based Support eparticipation Joint Projects Review Meeting DALOS - DrAfting Legislation with Ontology-based Support T. Agnoloni, L. Bacci, L. Dini, E. Francesconi, D. Tiscornia CNR-ITTIG Istituto di Teoria e Tecniche

More information

The Berkeley FrameNet Project

The Berkeley FrameNet Project The Berkeley FrameNet Project Collin F. Baker and Charles J. Fillmore and John B. Lowe {collinb, fillmore, jblowe}@icsi.berkeley.edu International Computer Science Institute 1947 Center St. Suite 600 Berkeley,

More information

Dictionary-Driven Semantic Look-up

Dictionary-Driven Semantic Look-up Computers and the Humanities 34: 193 197, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 193 Dictionary-Driven Semantic Look-up FRÉDÉRIQUE SEGOND 1, ELISABETH AIMELET 1, VERONIKA LUX

More information

Teaching Framework. Framework components

Teaching Framework. Framework components Teaching Framework Framework components CE/3007b/4Y09 UCLES 2014 Framework components Each category and sub-category of the framework is made up of components. The explanations below set out what is meant

More information

Automatic Enrichment of Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism*

Automatic Enrichment of Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism* Automatic Enrichment of Very Large Dictionary of Word Combinations on the Basis of Dependency Formalism* Alexander Gelbukh 1,2, Grigori Sidorov 1, San-Yong Han 2 1, and Erika Hernández-Rubio 1 Natural

More information

Terminology Retrieval: towards a synergy between thesaurus and free text searching

Terminology Retrieval: towards a synergy between thesaurus and free text searching Terminology Retrieval: towards a synergy between thesaurus and free text searching Anselmo Peñas, Felisa Verdejo and Julio Gonzalo Dpto. Lenguajes y Sistemas Informáticos, UNED {anselmo,felisa,julio}@lsi.uned.es

More information

Assessing speaking in the revised FCE Nick Saville and Peter Hargreaves

Assessing speaking in the revised FCE Nick Saville and Peter Hargreaves Assessing speaking in the revised FCE Nick Saville and Peter Hargreaves This paper describes the Speaking Test which forms part of the revised First Certificate of English (FCE) examination produced by

More information

2 P age. www.deafeducation.vic.edu.au

2 P age. www.deafeducation.vic.edu.au Building Connections Between the Signed and Written Language of Signing Deaf Children Michelle Baker & Michelle Stark In research relating to writing and deaf students there is a larger body of work that

More information

Introduction to 400 and Tables 4 and 6. Version 1.2 December 2013

Introduction to 400 and Tables 4 and 6. Version 1.2 December 2013 Introduction to 400 and Tables 4 and 6 Version 1.2 December 2013 Learning objectives The learner will: Understand the overall structure of the 400s: 400-409 410-419 420-499 Be able to build 420-499 numbers

More information

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE

AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE AN OPEN KNOWLEDGE BASE FOR ITALIAN LANGUAGE IN A COLLABORATIVE PERSPECTIVE Chiari I, A. Gangemi, E. Jezek, A. Oltramari, G. Vetere, L. Vieu http://www.sensocomune.it/ Sapienza Università di Roma Université

More information

The Role of Sentence Structure in Recognizing Textual Entailment

The Role of Sentence Structure in Recognizing Textual Entailment Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure

More information

Multilingual and Localization Support for Ontologies

Multilingual and Localization Support for Ontologies Multilingual and Localization Support for Ontologies Mauricio Espinoza, Asunción Gómez-Pérez and Elena Montiel-Ponsoda UPM, Laboratorio de Inteligencia Artificial, 28660 Boadilla del Monte, Spain {jespinoza,

More information

BACHELOR OF ARTS FINAL EXAMINATION LINGUISTICS Information Guidelines

BACHELOR OF ARTS FINAL EXAMINATION LINGUISTICS Information Guidelines BACHELOR OF ARTS FINAL EXAMINATION LINGUISTICS Information Guidelines Please read these guidelines before you start your preparation for the exam. The objective of the exam is to allow students to demonstrate

More information

Introduction to Manual Annotation

Introduction to Manual Annotation Introduction to Manual Annotation This document introduces the concept of annotations, their uses and the common types of manual annotation projects. This is a supplement to project-specific guidelines

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Register Differences between Prefabs in Native and EFL English

Register Differences between Prefabs in Native and EFL English Register Differences between Prefabs in Native and EFL English MARIA WIKTORSSON 1 Introduction In the later stages of EFL (English as a Foreign Language) learning, and foreign language learning in general,

More information

Developing and testing a self-assessment and tutoring system

Developing and testing a self-assessment and tutoring system Developing and testing a self-assessment and tutoring system Øistein E. Andersen Helen Yannakoudakis Fiona Barker Tim Parish ilexir University of Cambridge Building Educational Applications NAACL 2013

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information