Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base
|
|
|
- Tyrone Stone
- 10 years ago
- Views:
Transcription
1 Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base Aitor González Agirre, Egoitz Laparra, German Rigau Basque Country University Donostia, Basque Country {aitor.gonzalez, egoitz.laparra, german.rigau}@ehu.es Abstract This paper describes the upgrading process of the Multilingual Central Repository (MCR). The new MCR uses WordNet 3.0 as Interlingual-Index (ILI). Now, the current version of the MCR integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Galician. In order to provide ontological coherence to all the integrated wordnets, the MCR has also been enriched with a disparate set of ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. We also suggest a novel approach for improving some of the semantic resources integrated in the MCR, including a semiautomatic method to propagate domain information. The whole content of the MCR is freely available. 1 Introduction Building large and rich knowledge bases is a very costly effort which involves large research groups for long periods of development. For instance, hundreds of person-years have been invested in the development of wordnets for various languages (Fellbaum, 1998; Vossen, 1998; Tufis et al., 2004; K. et al., 2010). In the case of the English Word- Net, in more than ten years of manual construction (from 1995 to 2006, that is, from version 1.5 to 3.0), WordNet grew from 103,445 to 235,402 semantic relations 1, which represents a growth of around one thousand new relations per month. The Multilingual Central Repository (MCR) 2 (Atserias et al., 2004b) follows the model proposed by the European project EuroWordNet (LE- 1 Symmetric relations are counted only once ) (Vossen, 1998). The MCR is the result of the European MEANING project (IST ) (Rigau et al., 2002), as well as projects KNOW (TIN C03) 3 (Agirre et al., 2009), KNOW 2 (TIN C04) 4 and several complementary actions associated to the KNOW 2 project. The original MCR was aligned to the 1.6 version of WordNet. In the framework of the KNOW 2 project, we decided to upgrade the MCR to be aligned to a most recent version of WordNet. The previous version of the MCR, aligned to the English 1.6 WordNet version, also integrated the extended WordNet project (Mihalcea and Moldovan, 2001), large collections of selectional preferences acquired from SemCor (Agirre and Martinez, 2001) and different sets of named entities (Alfonseca and Manandhar, 2002). It was also enriched with semantic and ontological properties as Top Ontology (Álvez et al., 2008), SUMO (Pease et al., 2002) or WordNet Domains (Magnini and Cavaglià, 2000). The new MCR integrates wordnets of five different languages, including English, Spanish, Catalan, Basque and Galician. This paper presents the work carried out to upgrade the MCR to new versions of these resources. By using technology to automatically align wordnets (Daudé et al., 2003), we have been able to transport knowledge from different WordNet versions. Thus, we can maintain the compatibility between all the knowledge bases that use a particular version of Word- Net as a sense repository. However, most of the ontological knowledge have not been directly ported from the previous version of the MCR. Furthermore, WordNet Domains 5 was generated semi-automatically and has never been verified completely. Additionaly, it was aligned to
2 WordNet 1.6. Thus, one goal of this work is the automatic construction of a new semantic resource derived from WordNet Domains and aligned to WordNet 3.0. To assist in the correction and maintenance of the integrated resources in the MCR, we also adapted and enhanced the Web EuroWordNet Interface (WEI) in both consult and edit modes. 2 Multilingual Central Repository 3.0 The first version of the MCR was built following the model proposed by the EuroWordNet project. The EuroWordNet architecture includes the Inter- Lingual Index (ILI), a Domain Ontology and a Top Ontology (Vossen, 1998). Initially most of the knowledge uploaded into the MCR was aligned to WordNet 1.6 and the Spanish, Catalan, Basque and Italian WordNet and the MultiWordNet Domains, were using WordNet 1.6 as ILI (Bentivogli et al., 2002; Magnini and Cavaglià, 2000). Thus, the original MCR used Princeton WordNet 1.6 as ILI. This option also minimized side efects with other European initiatives (Balkanet, EuroTerm, etc.) and wordnet developments around Global WordNet Association. Thus, the Spanish, Catalan and Basque wordnets as well as the EuroWordNet Top Ontology and the associated Base Concepts were transported from its original WordNet 1.5 to WordNet 1.6 (Atserias et al., 1997; Benítez et al., 1998; Atserias et al., 2004a). The release of new free versions of Spanish and Galician wordnets aligned to Princeton WordNet 3.0 (Fernández-Montraveta et al., 2008; Xavier et al., 2011) brought with it the need to update the MCR and transport all its previous content to a new version using WordNet 3.0 as ILI. Thus, as a first step, we decided to transport Catalan and Basque wordnets and the ontological knowledge: Base Concepts, SUMO, WordNet Domains and Top Ontology. 2.1 Upgrading from 1.6 to 3.0 This section describes the process carried out for adapting the MCR to ILI 3.0. Due to its size and complexity, all this process have been mainly automatic. To perform the porting between the wordnets 1.6 and 3.0 we have followed a similar process to the one used to port the Spanish and Catalan versions from 1.5 to 1.6 (Atserias et al., 2004a). Figure 1: Example of a multiple intersection in the mapping between two versions of WordNet. Upgrading ILI: The algorithm to align wordnets (Daudé et al., 2000; Daudé et al., 2001; Daudé et al., 2003) produces two mappings for each POS, one in each direction (from 1.6 to 3.0, and from 3.0 to 1.6). To upgrade the ILI, different approaches were applied depending on the POS. For nouns, those synsets having multiple mappings from 1.6 to 3.0 were checked manually (Pociello et al., 2008). For verbs, adjectives and adverbs, for those synsets having multiple mappings, we took the intersection between the two mappings (from 1.6 to 3.0, and from 3.0 to 1.6). Upgrading WordNets: Finally, using the previous mapping, we transported from ILI 1.6 to ILI 3.0 the Basque (Pociello et al., 2008) and Catalan (Benítez et al., 1998) wordnets. The English WordNet was uploaded directly from the source files while the Spanish (Fernández- Montraveta et al., 2008) and Galician (Xavier et al., 2011) wordnets were directly uploaded from their database dumps. It is possible to have multiple intersections for a source synset. When multiple intersections colapsed into the same target synset, we decided to join the set of variants from the source synsets to the target synset. Figure 1 shows an example of this particular case (the intersections are displayed as dot lines). Therefore, the variants of the synsets A, B and C of WordNet 1.6 will be placed together in the synset Z of WordNet 3.0. Upgrading Base Concepts: We used Base Concepts directly generated for WN
3 Figure 2: Example of a multiple intersection in the mapping between two versions of WordNet. (Izquierdo et al., 2007). Upgrading SUMO: SUMO has been directly ported from version 1.6 using the mapping. Those unlabelled synsets have been filled through inheritance. The ontology of the previous version is a modified version of SUMO, trimmed and polished, to allow the use of first-order theorem provers (like E-prover or Vampire) for formal reasoning, called AdimenSUMO 7. The next step is to update AdimenSUMO using the latest version of SUMO for WordNet 3.0 (available on the website of SUMO 8 ). Upgrading WordNet Domains: As SUMO, what is currently in the MCR has been transported directly from version 1.6 using the mapping. Again, those unlabelled synsets have been filled through inheritance. In addition, new versions have been generated using graph techniques (see Section 4 for a detailed description of the process). Upgrading the Top Ontology: Similar to SUMO and WordNet Domains, what is currently available in the MCR has been transported directly from version 1.6 using the mapping. Once more, those unlabelled synsets have been filled through inheritance. It remains to check the incompatibilities between labels following (Álvez et al., 2008). An example of how to perform the process of inheritance used for SUMO, WordNet Domains and Top Ontology is shown in Figure 2. The example is presented for domains, but it can be applied to the other two cases. Figure 2 shows a sample hierarchy where each node represents a synset. The domains are displayed on the sides. The inherited domain labels are highlighted using dot lines. In this specific example synset soccer player inherits labels play and athlete from its hypernyms player and athlete, respectively. Note that synset hockey player does not inherit any label form its hypernyms because of it owns a domain (hockey). Similarly, synset goalkeeper does not inherit domains coming from the synset soccer player. Finally, synset titular goalkeeper inherits hockey domain (but neither play nor athlete domains). Thus, some of the current content of the MCR will require a future revision. Fortunately, by cross-checking its ontological knowledge most of these errors can be easily detected. 2.2 Web EuroWordNet Interface WEI is a web application that allows consulting and editing the data contained in the MCR and navigating throw them. Consulting refers to exploring the content of the MCR by accessing words, a synsets, a variants or ILIs. The interface presents different searching parameters and displays the query results. The different searching parameters are: Item: a value to search for, it can be a Word, a Synset a Variant or an ILI. Item type: the type of item to search for: Word, a Synset a Variant or an ILI. PoS: the item s gramatical cathegory or Part of Speech: Nouns, Verbs, Adjectives, Adverbs. Search: the type of search and subsearches (which are dinamically loaded from the database): Synonyms, Hyponyms, etc. WordNet Source: the WordNet from which navigate. Navigation WordNet: the WordNet to which navigate. Gloss: if selected it shows the glosses of the Synsets. Score: if selected, it shows the confidence factor. Rels: if selected, it shows information about the relations that each Synset has in all the target languages.
4 Full: if selected, makes a recursive search. Target WordNets: the target WordNets of our search. 2.3 Automatic translations The new version of WEI is able to use Automatic Translation Web Services for translating automatically the glosses and examples from other wordnets. This new feature helps users to complete and/or improve the gloss or examples of a given WordNet more quicky. Both glosses and examples are taken from the original English WordNet and translated to the target language. Suggestions for glosses and/or examples will appear below the existing ones, and may choose the most appropriate. In the current version, the translations of the glosses and examples are translated only from English (despite the possibility of translating from any available source). 2.4 Marks for synsets and variants In the new version of WEI it is possible to assign a mark to a variant or synset to indicate special properties. We can also write a small note or comment to explain better the reason to assign that mark. The available marks are the following: Variant marks: DUBLEX: For those variant with dubious lexicalization. INFL: Indicates that the variant is a inflected one. RARE: Old fashioned or rarely used variant. SUBCAT: Subcategorization. VULG: For those variants that are vulgar, rude, or offensive. Synset marks: GENLEX: Non-lexicalized general concepts that are introduced to better organize the hierarchy. HYPLEX: Indicates that the hypernym has identical lexicalization. SPECLEX: Domain specific terms that should be checked. 2.5 User management We also included a new user access control to WEI. The previous user access control to WEI was carried out using Apache, in the Operating System. This implies that the access control and user management was done outside WEI. The MCR is being edited in a distributed way. Several research groups are editing the MCR in some of the languages. Each group has different users. Thus, the responsibility of managing the users is also distributed. 3 Current state of the MCR In this section provide some information about the current state of the MCR, including the progress over the English WordNet. Tables 1, 2 and 3 present respectively the current number of synsets and variants, the number of glosses and the number of examples of each wordnet per PoS. 4 A proposal for upgrading WordNet Domains WordNet Domains 9 (WND) is a lexical resource developed at ITC-IRST where synsets have been semi-automatically annotated with one or more domain labels from a set of 165 hierarchically organized domains (Magnini and Cavaglià, 2000). WND allow to reduce the polysemy degree of the words, grouping those senses that belong to the same domain (Magnini et al., 2002). But the semi-automatic method used to develop this resource was not free of errors and inconsistencies. By cross-checking the ontological content of the MCR it is possible to find some of these problems. For instance, noun synset <diver 1 frogman 1 underwater diver 1> defined as someone who works underwater has domain history because it inherits from its hypernym <explorer 1 adventurer 2> Domain inheritance WND was developed using WordNet 1.6. One consequence of the automatic mapping that we used to upgrade version 1.6 to 3.0 is that many synsets were left unlabeled (because there are new synsets, changes in the structure, etc.). One of the first tasks undertaken has been to fill these gaps. For them, we has carried out a propagation of the labels by inheritance for nominal and verbal synsets. The inherent structure of Word- Net for adjectives and adverbs makes this spread 9
5 WordNet Nouns Verbs Adjectives Adverbs Synsets WN % EngWN ,360 25,051 30,004 5, , % SpaWN3.0 40,009 11,107 7,005 1,106 59,227 50% CatWN3.0 51,598 11,577 7, ,027 39% EusWN3.0 41,071 9, ,615 26% GalWN3.0 9,114 1,413 4, ,320 8% Table 1: Current number of synsets and variants of each WN. WordNet Nouns Verbs Adjectives Adverbs Synsets WN % EngWN3.0 82,379 13,767 18,156 3, , % SpaWN3.0 13,014 3,469 1, ,135 16% CatWN3.0 6, ,174 6% EusWN3.0 2, ,932 2% GalWN3.0 4, , ,111 7% Table 2: Current number of glosses of each WN. not trivial. Therefore, this simple process has been carried out only for nouns and verbs. We have worked exclusively on those synsets that had no labels at all. We inherited the labels from its hypernyms. If a synset has more than one hypernym, the domain labels are taken from all of them. We used a small list of incompatible labels to detect incompatibilities. Therefore, the same synset can not be both factotum and biology, or animals and plants. This process increased our domain information by nearly a 18-19%, as shown in Tables 4 and 5: PoS Before After Increase Nouns 66,595 83, % Verbs 12,219 14, % All 100, , % Table 4: Number of synsets with domain labels. PoS Before After Increase Nouns 87, , % Verbs 13,026 15, % All 124, , % Table 5: Total number of domain labels. However, this process may also have propagated innapropriate domain labels to unlabeled synsets. It remains for future research an accurate evaluation of this new resource. In the next section we present some examples using a new graph-based method for propagating domain labels through WordNet. Additionaly, the method can also be used to detect anomalies in the original WND labels A new graph based method UKB 10 algorithm (Agirre and Soroa, 2009) applies personalized PageRank on a graph derived from a wordnet. This algorithm has proven to be very competitive on Word Sense Disambiguation tasks and it is easily portable to other languages that have a wordnet (Agirre et al., 2010). Now, we present a novel use of the UKB algorithm for propagating information through a wordnet structure. Given an input context, ukb ppv (Personalized PageRank Vector) algorithm outputs a ranking vector over the nodes of a graph, after applying a Personalized PageRank over it. We just need to use a wordnet as a knowledge base and pass to the application the contexts we want to process, performing a kind of spreading activation through the WordNet structure. As context we use those synsets labelled with a particular domain. Thus, for each of the domain labels included in the MCR we generate a context. Each file contains the list of offsets corresponding to those synsets with a particular domain label. After creating the context file, we just need to execute ukb ppv that returns a ranking of weights for each WordNet synset with respect to that particular domain Excluding factotum labels.
6 WordNet Nouns Verbs Adjectives Adverbs Synsets WN % EngWN3.0 10,433 11,583 15,615 3,674 41, % SpaWN % CatWN3.0 2, ,517 6% EusWN3.0 2, ,377 6% GalWN , ,563 11% Table 3: Current number of examples of each WN. Once made the process for all domains we have weights for each synset and for each of the domains. Therefore, we know which are the highest weights for each domain and the highest weights for each synset. This allows us to estimate which synsets are more representative of each domain and which domains are best for each synset. Basically, what we do is to mark some synsets with a domain (using the labels we already know from the original porting process) and use the wordnet graph to propagate the new labelling. We work on the assumption that a synset directly related to several synsets labelled with a particular domain (i.e biology) would itself possibly be also related somehow to that domain (i.e. biology). Therefore, it makes no sense to use the domain factotum for this technique. Table 6 shows the first ten domains and weights resulting from the application of this method on synset <diver 1 frogman 1 underwater diver 1>. The suggestions of the algorithm seems to improve the current labeling because it suggests sub (possibly the best one) and diving (possibly, the second best option). Moreover, the method suggests the wrong label with a much lower weight. Weight Domain : sub : diving : swimming : history : nautical : fashion : jewellery : ethnology : archaeology : gas Table 6: PPV weight rankings for sense diver 1 n. Table 7 shows the first ten domains and weights resulting from the application of this method on synset <pornography 1 porno 1 porn 1 erotica 1 smut 5> defined as creative activity (writing or pictures or films etc.) of no literary or artistic value other than to stimulate sexual desire and labelled with the domain law. The suggestions of the algorithm seems to improve the current labeling because it suggests sexuality (possibly the best one) and cinema (possibly, the second best option). Moreover, the wrong label dissapears. Method 3 Weight Domain : sexuality : cinema : theatre : painting : telecommunication : publishing : psychological features : photography : artisanship : graphic arts Table 7: PPV weight rankings for sense porno 1 n. 5 Concluding Remarks and Future Directions As a result of this work, the current version of the MCR consistently maintains new wordnet versions for five languages (English, Spanish, Catalan, Basque and Galician), and the ontological knowledge from WordNet Domains, Top Ontology and SUMO. In particular, the main contributions of our work can be summarized as follows: We have created a new version of the MCR using WordNet 3.0 as ILI. We have improved the existing Web EuroWord- Net Interface (WEI) (both consult and edit interfaces) to work with the new version of the MCR. Now, the interface includes automatic translation
7 facilities, making it easier and faster the development of the resources integrated into the MCR. We also added new editing facilities for recording new linguistic information associated to the variants and synsets. We have uploaded into the new version of the MCR the English WordNet 3.0, the new Spanish WordNet 3.0 (Fernández-Montraveta et al., 2008) and a new Galician WordNet 3.0. We have used a complete mapping from Word- Net 1.6 to WordNet 3.0 (covering not only nouns, but verbs, adjectives and adverbs) to transport the Basque and Catalan wordnets and the ontological knowledge from the existing version of the MCR (using WordNet 1.6 as ILI) to the new MCR version (using WordNet 3.0 as ILI). We have applied a very simple estrategy to complete the ontological information by exploiting basic inheritance mechanisms. This process has been applied to WordNet Domains, Top Ontology and SUMO. We have also investigated a new approach for consistently propagating domain formation through the WordNet structure by exploiting a well-known graph algorithm using UKB. Although an exhaustive empirical evaluation should be addressed in a near future, a preliminary review of the new resources created using this process presents very interesting insights for future research. Obviously, further investigation is needed to assess the quality of the new labelling of WordNet Domains. We plan to evaluate the quality of these new resources indirectly by comparing their performance on a common Word Sense Disambiguation task. We would also like to continue studing different ways for selecting the most appropriate set of domain labels per synset. We also plan to derive domain information from Wikipedia by exploiting WordNet++ (Navigli and Ponzetto, 2010). The whole content of the MCR and the new WEI is freely available 12. Moreover, the maintenance of this type of resources is continuous, and all the integrated knowledge should be constantly updated and revised. Acknowledgments We thank the IXA NLP group from the Basque Country University. This work was been pos sible thanks to the support of the FP7 PATHS project (ICT ) and the Spanish national projects KNOW (TIN C03) and KNOW2 (TIN C04-04). References Agirre, E., Cuadros, M., Rigau, G., and Soroa, A. (2010). Exploring Knowledge Bases for Similarity. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC10). European Language Resources Association (ELRA). ISBN: Pages Agirre, E. and Martinez, D. (2001). Knowledge sources for Word Sense Disambiguation. In Proceedings of the Fourth International Conference TSD 2001, Plzen (Pilsen), Czech Republic. Published in the Springer Verlag Lecture Notes in Computer Science series. Václav Matousek, Pavel Mautner, Roman Moucek, Karel Tauser (eds.) Copyright Springer-Verlag. ISBN: Agirre, E., Rigau, G., Castellón, I., Alonso, L., Padró, L., Cuadros, M., Climent, S., and Coll-Florit, M. (2009). KNOW: Developing large-scale multilingual technologies for language understanding. Procesamiento del Lenguaje Natural, (43): Agirre, E. and Soroa, A. (2009). Personalizing PageRank for Word Sense Disambiguation. In Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics (EACL-2009), Athens, Greece. Alfonseca, E. and Manandhar, S. (2002). Distinguishing concepts and instances in WordNet. In Proceedings of the first International Conference of Global WordNet Association, Mysore, India. Álvez, J., Atserias, J., Carrera, J., Climent, S., Laparra, E., Oliver, A., and Rigau, G. (2008). Complete and Consistent Annotation of WordNet using the Top Concept Ontology. In Proceedings of the the 6th Conference on Language Resources and Evaluation (LREC 2008), Marrakech (Morocco). Atserias, J., Climent, S., Farreres, J., Rigau, G., and Rodríguez, H. (1997). Combining Multiple Methods for the Automatic Construction of Multilingual WordNets. In Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 97), Tzigov Chark, Bulgaria. Atserias, J., Rigau, G., and Villarejo, L. (2004a). Spanish WordNet 1.6: Porting the Spanish Wordnet across Princeton versions. In LREC 04. Atserias, J., Villarejo, L., Rigau, G., Agirre, E., Carroll, J., Vossen, P., and Magnini, B. (2004b). The MEANING Multilingual Central Respository. In Proceedings of the Second International Global WordNet Conference (GWC 04).
8 Bentivogli, L., Pianta, E., and Girardi, C. (2002). MultiWordNet: developing an aligned multilingual database. In Proceedings of the First International Conference on Global WordNet, Mysore, India. Benítez, L., Cervell, S., Escudero, G., López, M., Rigau, G., and Taulé, M. (1998). Methods and Tools for Building the Catalan WordNet. In Proceedings of ELRA Workshop on Language Resources for European Minority Languages, Granada, Spain. Daudé, J., Padró, L., and Rigau, G. (2000). Mapping WordNets Using Structural Information. In 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000)., Hong Kong. Daudé, J., Padró, L., and Rigau, G. (2001). A Complete WN1.5 to WN1.6 Mapping. In Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001)., Pittsburg, PA, USA. Daudé, J., Padró, L., and Rigau, G. (2003). Making Wordnet Mappings Robust. In Proceedings of the 19th Congreso de la Sociedad Española para el Procesamiento del Lenguage Natural, SEPLN, Universidad Universidad de Alcalá de Henares. Madrid, Spain. Fellbaum, C. (1998). WordNet. An Electronic Lexical Database. Language, Speech, and Communication. The MIT Press. Fernández-Montraveta, A., Vázquez, G., and Fellbaum, C. (2008). Text Resources and Lexical Knowledge, volume 33 of Text, Translation, Computational Processing, chapter The Spanish Version of WordNet 3.0, pages Mouton de Gruyter. Navigli, R. and Ponzetto, S. P. (2010). Building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages , Uppsala, Sweden. Pease, A., Niles, I., and Li, J. (2002). The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and its Applications. In AAAI Pociello, E., Gurrutxaga, A., Agirre, E., Aldezabal, I., and Rigau, G. (2008). WNTERM: Combining the Basque WordNet and a Terminological Dictionary. In 6th international conference on Language Resources and Evaluation, LREC 08. Rigau, G., Magnini, B., Agirre, E., Vossen, P., and Carroll, J. (2002). MEANING: A Roadmap to Knowledge Technologies. In Proceedings of COLING Workshop A Roadmap for Computational Linguistics, Taipei, Taiwan. Tufis, D., Cristea, D., and Stamou, S. (2004). Balka- Net: Aims, Methods, Results and Perspectives. A General Overview. Romanian Journal on Science Technology of Information. Special Issue on Balkanet, 7(3 4):9 44. Vossen, P. (1998). EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers. Xavier, G. G., Clemente, X. M. G., Pereira, A. G., and Lorenzo, V. T. (2011). Galnet: WordNet 3.0 do galego. Linguamática, 3(1): Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the Automatic Selection of Basic Level Concepts. In Proceedings of RANLP. K., R., S., T., T., C., V., S., and H., I. (2010). WNMS: Connecting the Distributed WordNet in the Case of Asian WordNet. In Proceedings of the 5th Global WordNet Conference (GWC 10), Mumbai (India). Magnini, B. and Cavaglià, G. (2000). Integrating subject field codes into WordNet. In Proceedings of the Second Internatgional Conference on Language Resources and Evaluation (LREC), Athens. Greece. Magnini, B., Satrapparava, C., Pezzulo, G., and Gliozzo, A. (2002). The Role of Domains Informations. In In Word Sense Disambiguation, Treto, Cambridge. Mihalcea, R. and Moldovan, D. (2001). extended WordNet: Progress Report. In Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, pages , Pittsburg, PA, USA.
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries
Construction of Thai WordNet Lexical Database from Machine Readable Dictionaries Patanakul Sathapornrungkij Department of Computer Science Faculty of Science, Mahidol University Rama6 Road, Ratchathewi
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets
Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells Computer Science Dep., Universidad Autonoma de Madrid, 28049 Madrid, Spain
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach
An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD
Knowledge-Based WSD on Specific Domains: Performing Better than Generic Supervised WSD Eneko Agirre and Oier Lopez de Lacalle and Aitor Soroa Informatika Fakultatea, University of the Basque Country 20018,
Latin WordNet project
Latin WordNet project Stefano Minozzi Laboratorio di Informatica Umanistica Università degli Studi di Verona Latin WordNet project Laboratorio di Informatica Umanistica Università degli Studi di Verona
Package wordnet. January 6, 2016
Title WordNet Interface Version 0.1-11 Package wordnet January 6, 2016 An interface to WordNet using the Jawbone Java API to WordNet. WordNet () is a large lexical database
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy)
Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano
Multilingual and Localization Support for Ontologies
Multilingual and Localization Support for Ontologies Mauricio Espinoza, Asunción Gómez-Pérez and Elena Montiel-Ponsoda UPM, Laboratorio de Inteligencia Artificial, 28660 Boadilla del Monte, Spain {jespinoza,
Romanian WordNet: Current State, New Applications and Prospects
Romanian WordNet: Current State, New Applications and Prospects Dan Tufiş, Radu Ion, Luigi Bozianu, Alexandru Ceauşu, and Dan Ştefănescu Romanian Academy Research Institute for Artificial Intelligence
Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
Understanding Text with Knowledge-Bases and Random Walks
Understanding Text with Knowledge-Bases and Random Walks Eneko Agirre ixa2.si.ehu.es/eneko IXA NLP Group University of the Basque Country MAVIR, 2011 Agirre (UBC) Knowledge-Bases and Random Walks MAVIR
Multilingual Event Detection using the NewsReader Pipelines
Multilingual Event Detection using the NewsReader Pipelines Rodrigo Agerri, Itziar Aldabe, Egoitz Laparra, German Rigau Antske Fokkens 3, Paul Huijgen 3, Ruben Izquierdo 3, Marieke van Erp 3, Piek Vossen
WordNet Website Development And Deployment Using Content Management Approach
WordNet Website Development And Deployment Using Content Management Approach N eha R P rabhugaonkar 1 Apur va S N ag venkar 1 Venkatesh P P rabhu 2 Ramdas N Karmali 1 (1) GOA UNIVERSITY, Taleigao - Goa
Word Sense Disambiguation as an Integer Linear Programming Problem
Word Sense Disambiguation as an Integer Linear Programming Problem Vicky Panagiotopoulou 1, Iraklis Varlamis 2, Ion Androutsopoulos 1, and George Tsatsaronis 3 1 Department of Informatics, Athens University
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
The Lois Project: Lexical Ontologies for Legal Information Sharing
The Lois Project: Lexical Ontologies for Legal Information Sharing Daniela Tiscornia Institute of Legal Information Theory and Techniques - Italian National Research Council Abstract. Semantic metadata
Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING
SEMANTIC RESOURCES AND THEIR APPLICATIONS IN HUNGARIAN NATURAL LANGUAGE PROCESSING Doctor of Philosophy Dissertation Márton Miháltz Supervisor: Gábor Prószéky, D.Sc. Multidisciplinary Technical Sciences
Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
English-Russian WordNet for Multilingual Mappings
English-Russian WordNet for Multilingual Mappings Sergey Yablonsky 1 1 St. Petersburg State University, Volkhovsky Per. 3, St. Petersburg, 199004, Russia {[email protected]} Abstract. This paper
Using WordNet.PT for translation: disambiguation and lexical selection decisions
INTERNATIONAL JOURNAL OF TRANSLATION Vol. XX, No. XX, XXXX XXXX Using WordNet.PT for translation: disambiguation and lexical selection decisions University of Lisbon, Portugal ABSTRACT Wordnets are extensively
Central and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
Natural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
Work: Lecturer in Applied and Computational Linguistics Faculty of Arts, Cairo University Email: [email protected] Cell phone: +2 010 10 50 498
Doaa Samy 1 Doaa A. Samy Work: Lecturer in Applied and Computational Linguistics Faculty of Arts, Cairo University Email: [email protected] Cell phone: +2 010 10 50 498 EDUCATION Ph.D. in Computational
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations
Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations Bento C. Dias-da-Silva 1, Ariani Di Felippo 2, Ricardo Hasegawa 3 Centro de Estudos Lingüísticos
Visualizing WordNet Structure
Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
UBY A Large-Scale Unified Lexical-Semantic Resource Based on LMF
UBY A Large-Scale Unified Lexical-Semantic Resource Based on LMF Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth Ubiquitous Knowledge Processing
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach -
Enriching the Crosslingual Link Structure of Wikipedia - A Classification-Based Approach - Philipp Sorg and Philipp Cimiano Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe, Germany {sorg,cimiano}@aifb.uni-karlsruhe.de
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
Effective Mentor Suggestion System for Collaborative Learning
Effective Mentor Suggestion System for Collaborative Learning Advait Raut 1 U pasana G 2 Ramakrishna Bairi 3 Ganesh Ramakrishnan 2 (1) IBM, Bangalore, India, 560045 (2) IITB, Mumbai, India, 400076 (3)
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/47901002 Building Ontology Networks: How to Obtain a Particular Ontology Network Life Cycle?
LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task
LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task Jacinto Mata, Mariano Crespo, Manuel J. Maña Dpto. de Tecnologías de la Información. Universidad de Huelva Ctra. Huelva - Palos de la Frontera s/n.
DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY
DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY The content of those documents are the exclusive property of REVER. The aim of those documents is to provide information and should, in no case,
Identifying free text plagiarism based on semantic similarity
Identifying free text plagiarism based on semantic similarity George Tsatsaronis Norwegian University of Science and Technology Department of Computer and Information Science Trondheim, Norway [email protected]
A Knowledge-based System for Translating FOL Formulas into NL Sentences
A Knowledge-based System for Translating FOL Formulas into NL Sentences Aikaterini Mpagouli, Ioannis Hatzilygeroudis University of Patras, School of Engineering Department of Computer Engineering & Informatics,
On the general structure of ontologies of instructional models
On the general structure of ontologies of instructional models Miguel-Angel Sicilia Information Engineering Research Unit Computer Science Dept., University of Alcalá Ctra. Barcelona km. 33.6 28871 Alcalá
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
How To Use Networked Ontology In E Health
A practical approach to create ontology networks in e-health: The NeOn take Tomás Pariente Lobo 1, *, Germán Herrero Cárcel 1, 1 A TOS Research and Innovation, ATOS Origin SAE, 28037 Madrid, Spain. Abstract.
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
ISSN: 2278-5299 365. Sean W. M. Siqueira, Maria Helena L. B. Braz, Rubens Nascimento Melo (2003), Web Technology for Education
International Journal of Latest Research in Science and Technology Vol.1,Issue 4 :Page No.364-368,November-December (2012) http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 EDUCATION BASED
ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Application of ontologies for the integration of network monitoring platforms
Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1
Requirements Ontology and Multi representation Strategy for Database Schema Evolution 1 Hassina Bounif, Stefano Spaccapietra, Rachel Pottinger Database Laboratory, EPFL, School of Computer and Communication
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
Analyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
A terminology model approach for defining and managing statistical metadata
A terminology model approach for defining and managing statistical metadata Comments to : R. Karge (49) 30-6576 2791 mail [email protected] Content 1 Introduction... 4 2 Knowledge presentation...
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
On the Evolution of Wikipedia: Dynamics of Categories and Articles
Wikipedia, a Social Pedia: Research Challenges and Opportunities: Papers from the 2015 ICWSM Workshop On the Evolution of Wikipedia: Dynamics of Categories and Articles Ramakrishna B. Bairi IITB-Monash
Clustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain [email protected] 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Cross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
