Using WordNet.PT for translation: disambiguation and lexical selection decisions
|
|
|
- Cody Berry
- 10 years ago
- Views:
Transcription
1 INTERNATIONAL JOURNAL OF TRANSLATION Vol. XX, No. XX, XXXX XXXX Using WordNet.PT for translation: disambiguation and lexical selection decisions University of Lisbon, Portugal ABSTRACT Wordnets are extensively used in language engineering and computational linguistics applications. The so-called mother of all Word- Nets, Princeton WordNet (henceforth, also WordNet) is said to be the lexical resource which probably has the widest field of application in those areas. Concerning translation, wordnets can be used both directly as an auxiliary tool and as a lexical basis for machine translation. Despite its strong psychological motivation, WordNet reveals some shortcomings for computer-aided and automatic translation purposes. In this paper we show how such shortcomings are overcome in the Portuguese WordNet (WN.PT), focusing in particular on two machine translation subtasks: word-sense disambiguation and lexical selection. We show that WN.PT has enough languagedescriptive richness to allow for interesting results in this domain. We also show that the nature of WN.PT linguistic specifications makes it a useful tool for computer aided translation. INTRODUCTION WordNet is recognized as a revolutionary reference system, which combines a thesaurus with an ontological database, cf. (Hanks 2003), among others. Besides, the WordNet model, a computational semantic network, has strong psychological motivation. It originates from the experiment on the mental lexicon organization conducted by George Miller in the early 80 s, which pointed out its relational nature, cf. (Miller 1990; Fellbaum 1998). Specifically, WordNet is structured as a network of conceptual relations among synsets (synset stands for a set of roughly synonymous word forms, representing the same concept). In other words, lexical units are defined by means of relations to each other. Besides synonymy, WordNet encodes conceptual hierarchies and part-hole relations.
2 2 Further evidence sustaining the relational nature of the lexicon has been offered by research in several domains, such as generative models of lexicon, cf. (Pustejovsky 1995), and lexicalist models of grammar, cf. (Pollard and Sag 1994). This way, relational lexicon models have played a leading role in machine lexical knowledge representation. WordNet, in particular, is considered the most important lexical resource available and has been extensively adopted in applications like machine translation, information retrieval and language learning systems, among many others. The Portuguese WordNet (Marrafa 2001; Marrafa 2002) is a wordnet developed in the general EuroWordNet framework (Vossen 1998). EuroWordNet wordnets follow the Princeton WordNet model, but they are richer concerning both the number and the nature of conceptual relations. Besides being richer with regard to the number and nature of conceptual relations, each EuroWordNet wordnet s synset is linked to an Inter-Lingual Index (ILI) record representing the same concept. The ILI plays the role of an Interlingua and this direct linkage naturally potentiates the use of EuroWordNet wordnets in multilingual applications. Although WN.PT is being developed in the general EuroWordNet framework, it reflects a more extensive use of relations encoding information related to the event and argument structure of lexical items. Moreover, basic research carried out on Portuguese in order to guarantee the accuracy of the database led us to some changes and new directions, cf. (Marrafa et al. 2006, Amaro et al and Marrafa and Mendes 2006), for instance, which have resulted, in particular, in the encoding of a set of new cross-pos relations. Accordingly, WN.PT is now a very rich lexical resource suitable for a wide range of applications. Further below we make apparent its usefulness for translation tasks. In the first section we focus on word-sense disambiguation, showing how the criteria used in WN.PT for word senses differentiation avoid over-differentiation, overcoming a shortcoming usually pointed out to WordNet for machine translation use. The second section depicts a set of new relations crucial for an accurate lexical selection, mainly when concepts are lexicalized in the source language but not in the target language. To conclude the paper, we emphasize the properties that make WN.PT a lexical resource that is a suitable tool for machine and computer aided translation.
3 USING WORDNET.PT FOR TRANSLATION 3 DIFFERENTIATION OF SENSES AND WORD-SENSE DISAMBIGUATION Distinguishing between multiple possible senses of a word is an important subtask of most NLP applications, machine translation amongst them. In fact, machine translation is one of the most direct applications of word-sense disambiguation: if we are able to identify the correct semantic meaning of each word in the source language, this will allow us to determine with more accuracy the appropriate words that lexicalize it in the target language. In its standard formulation, the disambiguation task is specified via an ontology, and the Princeton WordNet has often been used to define this ontology. However, as pointed out by Marrafa (2002), the decisions concerning encoding polysemy in wordnets are far from being obvious. Splitting or collapsing multiple senses has often weak motivation. On the other hand, the optimal degree of specification of meaning depends to a large extent on the goals of the applications. WordNet options on this matter are for very fine-grained distinctions. As a consequence, senses which are very similar and hard to distinguish are encoded separately in the network. Although the fine-grained nature of the WordNet specifications makes it an important lexical resource for several purposes, it seems to be too detailed, and, as a consequence, not perfectly suited for wordsense disambiguation in machine translation. As noted by Vickrey et al. (2005), in a machine translation system, the size of the solution space grows exponentially with the size of the set of candidate translations. Hence, having an unnecessarily numerous set of candidate translations has a negative impact on the word translation model and, thus, on the performance of the system. In WN.PT a more balanced approach is adopted, in order to avoid over-differentiation. Besides, decisions on this matter have linguistic support. We concentrate here on logical polysemy and on causativeinchoative alternations (in both cases the different senses are systematically related). Let us take first the example of book. Among other, WordNet specifies the following senses for this word form:
4 4 (n) book (a written work or composition that has been published (printed on pages bound together)) "I am reading a good book on economics" (n) book, volume (physical objects consisting of a number of pages bound together) "he used a large book as a doorstop" WordNet 3.0 In contradistinction, WN.PT collapses the two senses mentioned above for the corresponding Portuguese word form, livro. Sentences like (1) provide empirical support to this option, since the interpretation of the relative pronoun que (which) necessarily involves both senses (facets of sense, to be more precise). (1) Ele rasgou o livro que eu memorizei. He tore the book which I memorized We argue that the behavior of lexical units like livro (book) is captured by the lexical conceptual paradigm (lpc) that, for a given term α, which carries two senses, σ 1 and σ 2, creates a complex type σ 1 σ 2 : (2) α : σ 1 α : σ 2 lpc (α) : σ 1 σ 2 Being of a complex type, a lexical unit like livro is encoded in only one synset. Although on a different basis, we give a similar treatment to certain verbal alternations. We take here the case of causative-inchoative alternations, exemplified below: (3) a. Ele aqueceu a sopa lentamente. He warmed the soup slowly b. A sopa aqueceu lentamente. The soup warmed slowly The senses of aquecer (warm) in (3a) and (3b) correspond, respectively, to the senses specified in the following synsets of WordNet: (v) warm (make warm or warmer) (v) warm, warm up (get warm or warmer) WordNet 3.0
5 USING WORDNET.PT FOR TRANSLATION 5 We argue that aquecer denotes the same event in the two sentences above, which has the following abbreviate event structure (EVENT_STR): (4) aquecer E1 = process EVENT_STR = E2 = state RESTR = E1< E2 The internal event structure of verbs like aquecer integrates two subevents E1 and E2 with the precedence restriction codified in RESTR. Therefore, both (3a) and (3b) involve a process which implies a change of state of the same argument, soup. In other words, both the statement in (3a) and the statement in (3b) imply a state of affairs where the soup is warm 1. According to the evidence, in WN.PT instead of two synsets we only have one to encode this type of alternation. As shown, over-differentiation is avoided in WN.PT in an empirically sustained basis. Without inducing under-differentiation neither loosing linguistic adequacy, the strategies adopted to encode polysemy reduce the number of possible choices to a minimum, and then the size of the solution space, when used for word-sense disambiguation. LEXICAL-CONCEPTUAL RELATIONS AND LEXICAL SELECTION In translation, both human and machine translation, the choice of lexical units to be used in the formulation of a target language sentence can be considerably facilitated if there is some kind of knowledge base to support it. This need for some kind of semantic knowledge about the lexical content of sentences in source language is even stronger when a given concept is not lexicalized in the target language. WordNet has often been used as a lexical knowledge base in NLP systems. Dorr and Olsen (1996) and Dorr et al. (1998), for instance, use WordNet-based information in order to link lexical units from source to target language, in the context of machine translation, and multilingual generation in general. These authors combine the information encoded 1 The fact that we have two arguments syntactically realized in (3a) and only one in (3b) is due to the fact that the global event is headless (we let aside this discussion here).
6 6 in WordNet with a repository of verb Lexical Conceptual Structures (Jackendoff 1990); LCSs, henceforth, in order to be able to distinguish between semantically close lexical items. This repository of verb LCSs is used to add semantic information to the knowledge base when the WordNet hierarchy is shallow, which is common for verbs. Hence, according to these authors, the richness in the argument structure provided by the LCSs compensates shallow hierarchies in WordNet, the reverse also being true. Nouns, on the contrary, generally show a deep hierarchy in WordNet, but their argument structures in the LCSs are comparatively poor. The lack of information referred to above is internally overcome in WN.PT, without the need to create a parallel database to encode non hierarchical information. The WN.PT network is much denser than the one of WordNet. On one side, the system, following the general framework of EuroWord- Net, allows the encoding of non hierarchical information. On the other side, we have defined a small set of new relations, cross-pos relations included, to encode information which is not usually specified in wordnets. As a matter of fact, in WN.PT, as in other EuroWordNet wordnets, the network includes several conceptual relations that allow for encoding information concerning the participants in the event denoted by a lexical item, as well as the role they play. To take an example, in the scheme presented below, we present a part of the network of relations of embark. {move} {quay} has hyperonym involved source direction involved agent {passenger} {station} involved source direction is subevent of {embark} involved target direction {travel} {means of transportation}
7 USING WORDNET.PT FOR TRANSLATION 7 Dense local networks of relations, such as the one above, encode the information concerning verbs argument structure. Hence, no parallel database is needed to make the semantic content of lexical items available, neither to distinguish between semantically close lexical items. The information needed in the context of machine translation for establishing the link between source and target language lexical items is found within WN.PT. As referred to above, WN.PT also encodes information not usually specified in this kind of lexical resources. This is the case of telicity, mostly considered a compositional property of meaning 2. Contrarily to what is commonly done in this domain, WN.PT encodes telicity of verbs. This fact makes WN.PT as much suitable for translation as languages do not behave uniformly both within the same language and cross-linguistically. As argued in Marrafa (2005) and previous work, by default, telic verbs have the following abbreviate lexical conceptual structure: (5) [T [P act(x,y) and ~ Q(y)], [eq(y)]] T: transition, P: process, e: event, Q: atomic event (point state), x and y: participants in the event With certain telic verbs, the telic state, Q, is obligatorily syntactically realized and generally corresponds to an adjectival constituent, while other subclasses incorporate the telic state, like exemplified below: (6) a. O concurso tornou o João rico. The contest made John rich b. *O concurso tornou o João. The contest made John (7) a. O concurso enriqueceu o João. The contest enriched John b. *O concurso enriqueceu o João rico. The contest enriched John rich The grammaticality contrast above is due to the fact that enriquecer (enrich) incorporates the telic state. This justifies that this verb can be 2 Telicity is mostly considered a compositional property of meaning. Marrafa (2004), and further work, shows that it is also a lexical feature that, as a consequence, has to be represented in the lexicon.
8 8 paraphrased by tornar rico (make rich). In order to incorporate this information in WN.PT, we introduce new semantic relations to encode telic verbs in the database (on this issue see also Marrafa 2005; Amaro et al. 2006). We capture the telicity of verbs like tornar (make) by including a new relation in the set of internal relations of wordnets: the telic subevent relation, as exemplified below. (8) {make} has_telic_subevent {state} {state} is_telic_subevent_of {make} (defeasible) 3 Relating make to state by means of this relation, we capture the telic properties of the verb and let the specific nature of the final state underspecified. This way, we also account for the weakness of the verb selection restrictions. We also use this relation to encode telicity in the case of the subclass that incorporates the telic state. In these cases, we use the telic subevent relation to relate the verb to the expression corresponding to the incorporated telic information. The global solution is schematically presented below: is hyperonym of {make} {enrich} has telic subevent {state} is telic subevent of is hyperonym of has telic subevent {rich} is telic subevent of It should be noticed that there is a direct correspondence between enriquecer and enrich, in the examples presented above. However, this is not always the case. Portuguese verbs such as enlouquecer (make crazy) or emudecer (make dumb), cannot be directly translated by an English verb. Nonetheless, as stated above, these telic verbs can be paraphrased by an expression like 'to make X', where X is a state. Hence, lexical conceptual relations like has_telic_subevent allow us to encode the relation between telic verbs and their final state in the database, thus making it possible to straightforwardly generate this inference: if enlouquecer is a troponym of tornar (make) and has_telic_subevent louco (crazy), then enlouquecer can be translated by an expression like make crazy. 3 The relation is not obligatory in this direction.
9 USING WORDNET.PT FOR TRANSLATION 9 This way we cope with the absence of a direct correspondence between certain expressions in different languages Extending WN.PT to all the main POS, needed for translation, involved a revision of certain commonly used relations and the specification of new cross-pos relations, despite the fact that important structural information can be extracted from the hierarchical organization of lexical items. Marrafa and Mendes (2006) focus on adjectives to discuss the problem of appropriately capturing the semantics of all POS via the specification of the appropriate cross-pos relations. As pointed out by Fellbaum et al. (1993) and Miller (1998), the semantic organization of adjectives is unlike that of nouns and verbs. In fact, generally, this POS does not show a hierarchical organization. Thus, encoding adjectives in wordnets calls for the specification of new cross-pos semantic relations, in order to mirror definitional features of this POS in the network. Ideally, the distinctive syntactic and semantic properties of lexical items would be encoded in lexical models, for almost all their applications, translation included. Combining these new relations with other, previously existing in the WordNet model, allows us to encode the basic characteristics of major adjective classes in WN.PT. The main characterizing relations are characterizes_with_regard_to/can_be characterized_by and antonymy for descriptive adjectives and the is related to for relational adjectives. We also use another cross-pos relation involving adjectives: is_characteristic/has_as_a characteristic. This relation encodes salient characteristics of nouns expressed by adjectival expressions. Although we can discuss the status of this relation, namely whether it concerns lexical knowledge or not, it regards crucial information for many wordnet based applications, particularly those using inference systems. Also, it allows for richer and clearer synsets. The linguistic accuracy of the information encoded in WN.PT should be noticed. Anyway, we focus our discussion here on the relevance of these results in the context of machine translation. We have mentioned above we use the is_related_to relation to encode relational adjectives. Relational adjectives are property ascribing adjectives. However, they entail complex and diversified relations between the set of properties they introduce and the modified noun. Levi (1978) points out that the intrinsic meaning of these adjectives is something like 'of, relating/pertaining to, associated with' some noun. The way these adjectives are encoded both in WordNet and in WN.PT mirrors this observation, as it links relational adjectives to the nouns they relate to.
10 10 Expressing this relation can be particularly useful for translation systems when there is no direct correspondence between a relational adjective in the source and the target language. Let us look at (9) and (10): (9) o restaurante turístico (10) the tourist restaurant In this kind of contexts, the expression corresponding to the Portuguese relational adjective (such as turístico) is usually an English noun (like tourist). It often happens that even when there is a corresponding relational adjective between Portuguese and English, marítimo and maritime, for instance, the standard lexicalization pattern in English for the kind of Portuguese syntactic structure presented in (9) does not entail the occurrence of the relational adjective. In fact, the most common and unmarked translation of the Portuguese phrase o cais marítimo is the sea quay, and not the maritime quay. Thus, as the expression in the target language makes use of a nominal expression, and not of an adjective, WN.PT copes with this kind of differences by linking relational adjectives and the nouns that denote the set of properties they ascribe. CONCLUSION Despite the fact that WordNet is still one of the main lexical resources used in this domain, both as an auxiliary tool in human translation and as a lexical database for machine translation, it has some shortcomings in the context of translation, as referred in the literature. We have shown how WN.PT overcomes those shortcomings, making apparent that this lexical conceptual database has enough languagedescriptive richness to allow for interesting results in the domain of translation. We focused on two major aspects: the strategies put at work in order to have an optimal degree of specification of meaning in the database; and the nature of conceptual relations expressed in WN.PT. With regard to meaning specification, it has been made apparent that the approach used in WN.PT, besides being linguistically motivated, allows for avoiding over-differentiation and under-differentiation of senses. This aspect is crucial in terms of the balance between the precision and the efficiency of machine translation systems, as it improves the performance of word-sense disambiguation modules. Concerning the conceptual relations expressed in WN.PT, some of which are inherited from the general EuroWordNet framework, and some other emerged from basic research on modeling all main POS in
11 USING WORDNET.PT FOR TRANSLATION 11 computational relational lexica, it has been shown how the lack of information pointed out to WordNet, and balanced by several authors with parallel databases encoding non hierarchical information, is internally overcome in WN.PT. The richness of the information encoded in WN.PT (information concerning event participants and their role, telicity, and distinctive syntactic and semantic properties of lexical items) make it a useful lexical resource for machine lexical knowledge representation, crucial for any machine translation system. Also, the straightforward and clear representation of the semantic content of lexical items allow for a direct use of WN.PT as an auxiliary tool in computer aided translation.
12 12 REFERENCES Amaro, R., Chaves, R. P., Marrafa, P. & Mendes, S Enriching wordnets with new Relations and with event and argument structures. Proceedings of CICLing 2006 Conferences on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp Dorr, B. J., Martí, A. & Castellón, I Evaluation of EuroWordNet- and LCS-based lexical resources for machine translation. Proceedings of the First International Conference on Language Resources and Evaluation. Granada, Spain.., & Olsen, M. B Multilingual Generation: The Role of Telicity in Lexical Choice and Syntactic Realization. Machine Translation, 11 (1-3), pp Fellbaum, C A Semantic Network of English: The Mother of all Word- Nets. In P. Vossen (ed.) EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Dordrecht: Kluwer Academic Publishers, pp , Gross, D. & Miller, K Adjectives in WordNet. In G. Miller et al. (ed.) Fon WordNet. Technical Report, Cognitive Science Laboratory, Princeton University, pp Hanks, P Lexicography. In R. Mitkov (ed.) The Oxford Handbook of Computational Linguistics. Oxford: Oxford University Press, pp Jackendoff, R Semantic Structures. Cambridge, MA: MIT Press. Levi, J. N The Syntax and Semantic of complex nominals. New York: Academic Press. Marrafa, P WordNet do Português: uma base de dados de conhecimento linguístico. Lisbon, Portugal: Instituto Camões Portuguese WordNet: general architecture and internal semantic relations. D.E.L.T.A., Extending WordNets to Implicit Information. Proceedings of LREC 2004: International Conference on Language Resources and Evaluation, Lisbon, Portugal The representation of complex telic predicates in WordNets: the case of lexical-conceptual structure deficitary verbs. In J. C. Nosa, A. Gelbukh, & E. Tovar (eds.) Research on Computer Science, volume 12, pp , Amaro, R., Chaves, R. P., Lourosa, S., Martins, C. & Mendes, S WordNet.PT new directions. Proceedings of GWC 06: 3rd International Wordnet Conference, Jeju Island, Korea.., Ribeiro, C., & Santos, R Para o Processamento da Linguagem Natural: Reutilização de Recursos Lexicais. Actas da 3ª Conferencia Iberoamericana en Sistemas, Cibernética e Informática, Orlando, Florida, USA.., & Mendes, S Modeling Adjectives in Computational Relational Lexica. Proceedings of COLING/ACL 2006, Sydney, Australia.
13 USING WORDNET.PT FOR TRANSLATION 13 Pollard, C. & Sag, I Head-driven Phrase Structure Grammar. Chicago: CSLI Publications. Pustejovsky, J The Generative Lexicon. Cambridge, MA: The MIT Press. Miller, G. A WordNet: an on-line Lexical Database. Special Issue of International Journal of Lexicography. volume 3, nº 4. Miller, K. J Modifiers in WordNet. In C. Fellbaum (ed) WordNet: an online Lexical Database. Cambridge, MA: The MIT Press, pp Vickrey, D., Biewald, L., Teyssier, M. & Koller, D Word-sense disambiguation for machine translation. Proceedings of the HLT/EMNLP, Vancouver, BC, pp Vossen, P (ed.), EuroWordNet: A Multilingual Database with Lexical Semantic Networks, Dordrecht: Kluwer Academic Publishers. PALMIRA MARRAFA DEPARTMENT OF LINGUISTICS FACULTY OF ARTS, UNIVERSITY OF LISBON, ALAMEDA DA UNIVERSIDADE, LISBOA, PORTUGAL AND GROUP FOR THE COMPUTATION OF LEXICAL GRAMMATICAL KNOWLEDGE CENTRE OF LINGUISTICS, UNIVERSITY OF LISBON, AVENIDA PROFESSOR GAMA PINTO, LISBOA, PORTUGAL <[email protected]> SARA MENDES GROUP FOR THE COMPUTATION OF LEXICAL GRAMMATICAL KNOWLEDGE CENTRE OF LINGUISTICS, UNIVERSITY OF LISBON, AVENIDA PROFESSOR GAMA PINTO, LISBOA, PORTUGAL <[email protected]>
Modeling adjectives in GL: accounting for all adjective classes 1
Modeling adjectives in GL: accounting for all adjective classes 1 Sara Mendes Raquel Amaro Group for the Computation of Lexical Grammatical Knowledge Centre of Linguistics, University of Lisbon Avenida
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
The compositional semantics of same
The compositional semantics of same Mike Solomon Amherst College Abstract Barker (2007) proposes the first strictly compositional semantic analysis of internal same. I show that Barker s analysis fails
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos
Visualizing WordNet Structure
Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,
Event modifying adjectives in Portuguese
Event modifying adjectives in Portuguese ara MENDE Group for the Computation of Lexical Grammatical Knowledge University of Lisbon Center of Linguistics Avenida Professor Gama Pinto, 2 1649-003 Lisboa
COMPARATIVES WITHOUT DEGREES: A NEW APPROACH. FRIEDERIKE MOLTMANN IHPST, Paris [email protected]
COMPARATIVES WITHOUT DEGREES: A NEW APPROACH FRIEDERIKE MOLTMANN IHPST, Paris [email protected] It has become common to analyse comparatives by using degrees, so that John is happier than Mary would
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Package wordnet. January 6, 2016
Title WordNet Interface Version 0.1-11 Package wordnet January 6, 2016 An interface to WordNet using the Jawbone Java API to WordNet. WordNet () is a large lexical database
A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University) [email protected]
A Beautiful Four Days in Berlin Takafumi Maekawa (Ryukoku University) [email protected] 1. The Data This paper presents an analysis of such noun phrases as in (1) within the framework of Head-driven
ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
Title: Chinese Characters and Top Ontology in EuroWordNet
Title: Chinese Characters and Top Ontology in EuroWordNet Paper by: Shun Sylvia Wong & Karel Pala Presentation By: Patrick Baker Introduction WordNet, Cyc, HowNet, and EuroWordNet each use a hierarchical
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
Overview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Semantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
Customer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun [email protected] Mohamed Salah Gouider [email protected] Lamjed Ben Said [email protected] ABSTRACT
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Type Theory and Lexical Decomposition
Type Theory and Lexical Decomposition James Pustejovsky Department of Computer Science Brandeis University, Waltham, MA 02454 [email protected] Abstract In this paper, I explore the relation between
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Using Ontologies for Geographic Information Intergration Frederico Torres Fonseca
USING ONTOLOGIES FOR GEOGRAPHIC INFORMATION INTEGRATION Frederico Torres Fonseca The Pennsylvania State University, USA Keywords: ontologies, GIS, geographic information integration, interoperability Contents
A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?
What s in a Lexicon What kind of Information should a Lexicon contain? The Lexicon Miriam Butt November 2002 Semantic: information about lexical meaning and relations (thematic roles, selectional restrictions,
Run-time Variability Issues in Software Product Lines
Run-time Variability Issues in Software Product Lines Alexandre Bragança 1 and Ricardo J. Machado 2 1 Dep. I&D, I2S Informática Sistemas e Serviços SA, Porto, Portugal, [email protected] 2 Dep.
A Case Study of Question Answering in Automatic Tourism Service Packaging
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question
WordNet Website Development And Deployment Using Content Management Approach
WordNet Website Development And Deployment Using Content Management Approach N eha R P rabhugaonkar 1 Apur va S N ag venkar 1 Venkatesh P P rabhu 2 Ramdas N Karmali 1 (1) GOA UNIVERSITY, Taleigao - Goa
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
Transaction-Typed Points TTPoints
Transaction-Typed Points TTPoints version: 1.0 Technical Report RA-8/2011 Mirosław Ochodek Institute of Computing Science Poznan University of Technology Project operated within the Foundation for Polish
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
D6 INFORMATION SYSTEMS DEVELOPMENT. SOLUTIONS & MARKING SCHEME. June 2013
D6 INFORMATION SYSTEMS DEVELOPMENT. SOLUTIONS & MARKING SCHEME. June 2013 The purpose of these questions is to establish that the students understand the basic ideas that underpin the course. The answers
Organizational Issues Arising from the Integration of the Lexicon and Concept Network in a Text Understanding System
Organizational Issues Arising from the Integration of the Lexicon and Concept Network in a Text Understanding System Padraig Cunningham, Tony Veale Hitachi Dublin Laboratory Trinity College, College Green,
Intro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
The Lois Project: Lexical Ontologies for Legal Information Sharing
The Lois Project: Lexical Ontologies for Legal Information Sharing Daniela Tiscornia Institute of Legal Information Theory and Techniques - Italian National Research Council Abstract. Semantic metadata
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
From Business World to Software World: Deriving Class Diagrams from Business Process Models
From Business World to Software World: Deriving Class Diagrams from Business Process Models WARARAT RUNGWORAWUT 1 AND TWITTIE SENIVONGSE 2 Department of Computer Engineering, Chulalongkorn University 254
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
CREATING LEARNING OUTCOMES
CREATING LEARNING OUTCOMES What Are Student Learning Outcomes? Learning outcomes are statements of the knowledge, skills and abilities individual students should possess and can demonstrate upon completion
1 Basic concepts. 1.1 What is morphology?
EXTRACT 1 Basic concepts It has become a tradition to begin monographs and textbooks on morphology with a tribute to the German poet Johann Wolfgang von Goethe, who invented the term Morphologie in 1790
PS I TAM-TAM Aspect [20/11/09] 1
PS I TAM-TAM Aspect [20/11/09] 1 Binnick, Robert I. (2006): "Aspect and Aspectuality". In: Bas Aarts & April McMahon (eds). The Handbook of English Linguistics. Malden, MA et al.: Blackwell Publishing,
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
Introduction to formal semantics -
Introduction to formal semantics - Introduction to formal semantics 1 / 25 structure Motivation - Philosophy paradox antinomy division in object und Meta language Semiotics syntax semantics Pragmatics
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
A Knowledge-based System for Translating FOL Formulas into NL Sentences
A Knowledge-based System for Translating FOL Formulas into NL Sentences Aikaterini Mpagouli, Ioannis Hatzilygeroudis University of Patras, School of Engineering Department of Computer Engineering & Informatics,
Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Using NLP and Ontologies for Notary Document Management Systems
Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli
The Specification Group
SIMPLE Work Package 2 Linguistic Specifications Deliverable D2.1 March 2000 The Specification Group Alessandro Lenci, Federica Busa, Nilda Ruimy, Elisabetta Gola, Monica Monachini, Nicoletta Calzolari,
Using Text Mining and Natural Language Processing for Health Care Claims Processing
Using Text Mining and Natural Language Processing for Health Care Claims Processing Fred Popowich Axonwave Software Suite 873, 595 Burrard PO Box 49042 Vancouver, BC CANADA V7X 1C4 [email protected]
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Distributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
From Logic to Montague Grammar: Some Formal and Conceptual Foundations of Semantic Theory
From Logic to Montague Grammar: Some Formal and Conceptual Foundations of Semantic Theory Syllabus Linguistics 720 Tuesday, Thursday 2:30 3:45 Room: Dickinson 110 Course Instructor: Seth Cable Course Mentor:
The study of words. Word Meaning. Lexical semantics. Synonymy. LING 130 Fall 2005 James Pustejovsky. ! What does a word mean?
Word Meaning LING 130 Fall 2005 James Pustejovsky The study of words! What does a word mean?! To what extent is it a linguistic matter?! To what extent is it a matter of world knowledge? Thanks to Richard
I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION
Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University
Latin WordNet project
Latin WordNet project Stefano Minozzi Laboratorio di Informatica Umanistica Università degli Studi di Verona Latin WordNet project Laboratorio di Informatica Umanistica Università degli Studi di Verona
Empirical Machine Translation and its Evaluation
Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Jesús Giménez (Advisor, Lluís Màrquez) Universitat Politècnica de Catalunya May 28, 2010 Empirical Machine Translation Empirical
Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
Human Language Technology Research and the Development of the Brazilian Portuguese Wordnet
1 Human Language Technology Research and the Development of the Brazilian Portuguese Wordnet Bento Carlos DIAS-DA-SILVA Faculdade de Ciências e Letras, Universidade Estadual Paulista, Rodovia Araraquara-Jau
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education
International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED
Clustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain [email protected] 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
Computers and the Creative Process
Computers and the Creative Process Kostas Terzidis In this paper the role of the computer in the creative process is discussed. The main focus is the investigation of whether computers can be regarded
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
Taxonomies in Practice Welcome to the second decade of online taxonomy construction
Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods
Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013.
Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013. I. Course description, as it will appear in the bulletins. II. A full description of the content of the course III. Rationale
Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001
A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS
The Use of Terminological Knowledge Bases in Software Localisation
The Use of Terminological Knowledge Bases in Software Localisation E.A. Karkaletsis, C.D. Spyropoulos, G. Vouros Institute of Informatics & Telecommunications, N.C.S.R. "Demokritos", 15310 Aghia Paraskevi,
Key words related to the foci of the paper: master s degree, essay, admission exam, graders
Assessment on the basis of essay writing in the admission to the master s degree in the Republic of Azerbaijan Natig Aliyev Mahabbat Akbarli Javanshir Orujov The State Students Admission Commission (SSAC),
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
