JaVaLI!: understanding real questions
|
|
|
- Christal Crawford
- 10 years ago
- Views:
Transcription
1 JaVaLI!: understanding real questions Luísa Coheur L 2 F INESC-ID/GRIL Fernando Batista L 2 F INESC-ID/ISCTE Joana Paulo L 2 F INESC-ID/IST Spoken Languages Systems Laboratory R. Alves Redol, 9, Lisbon, Portugal {luisa.coheur, fernando.batista, joana.paulo}@l2f.inesc-id.pt Abstract This paper presents a linguistically motivated natural language processing system called JaVaLI! 1, that transforms unrestricted text into logical forms. Special focus is given to ambiguity and linguistic variation problems, which can be handled in different steps of the system processing chain. Our system has been tested in the question interpretation domain and some preliminary results over a corpus of 680 questions in Portuguese are presented. 1 Introduction Ambiguity contributes significantly to the complexity of linguistically motivated natural language (NL) systems. In fact, traditionally, if a word is ambiguous, the system has to deal with all possible values from the beginning of the processing chain, even if disambiguation only takes place some steps ahead. Linguistic variations the phenomenon in which similar semantic content may be expressed in different surface forms (Katz and Lin, 2000) also increase the difficulty of translating text into logical forms, as different formulas are obtained for semantically equivalent sequences. This paper has been partially supported by FCT (Fundação para a Ciência e Tecnologia). 1 JaVaLI! stands for Já vamos na linguagem de interpretação! This paper describes JaVaLI!, a linguistically motivated natural language interface to a database of tourist resources, that translates unrestricted text into a formal language. As expected, the above mentioned problems ambiguity and linguistic variations contributed significantly to JaVaLI! s complexity. In order to deal with these problems, special treatment of particular elements such as interrogative pronouns and some special verbs are applied in each step of the processing chain, simplifying our goal. In fact, authors such as (Katz et al., 1998) and (Milward, 1999) already highlighted the importance of handling structures such as dates and compound nouns in preprocessing modules, simplifying the processing chain. This document is organized as follows: we start by presenting the different modules that compose JaVaLI!. A case-study about questions in the tourist resources domain is presented in section 3. Finally, a brief evaluation is conducted in section 4. The document ends with some concluding remarks and future work. 2 General architecture JaVaLI! embodies several well compartmentalized sub-systems which, as Blitz s components (Katz et al., 1998), can be easily interchanged and switched on or off (Matos et al., 2003). Morphossyntactic analysis is the first step performed using an external dictionary. The resulting text is then passed to a post-morphological processor that detects and forms special groups according to recomposition and correspondence
2 rules. This module groups the words into sentences and the resulting text is sent to a syntactic analyzer that, using a surface grammar, slips the sentences into nuclear phrases (structures from the chunk s family (Abney, 1995)). Chunks are then connected. Finally, and before creating the formulas in the representation language, the structures resulting from the previous step are transformed into a graph. The overall process (Figure 1) is described below in more detail. 2.2 Post-Lemmatization Process PAsMo (Paulo, 2001) is responsible for the Post- Lemmatization Process. Based on recomposition and correspondence rules, it processes sequences of words, such as dates, compound nouns, consecutive unknown words, numbers, and so on. It may also be used to translate tags, thus facilitating the interface with the syntactic analyzer. PAsMo was rebuilt from an old system (Faiza, 1999), improving its efficiency and expressivity. 2.3 Syntactic Analysis The syntactic analysis is performed by SuSAna (Batista and Mamede, 2002), which produces a surface analysis where chucks are identified. Then, chunks are connected by arrows, which is done by AlgAS (Coheur, 2003). Finally, the resulting structure is converted into a graph. Arrowing relations are somehow related with dependencies, but, contrary to the main dependency theories, arrows go from dependents to the head (Hagège, 2000). Moreover, the motivation behind an arrow relation is simply to connect two elements, because the established relations are needed to reach the desired semantic representation (details about this concept are given, e.g., in (Bès, 1999; Bès and Hagège, November 2001)). To conclude, we state that each element is the source of at most one arrow and that no arrow s crossing is allowed. Figure 1: JaVaLI s architecture. 2.1 Morphossyntactic analysis The morphossyntactic module enriches each word with its morphological characterization. For this task we use SMorph (Aït-Mokhtar, 1998) that also allows the construction of large dictionaries. The user declares the dictionary which is converted by SMorph into a compact binary file containing the correspondent finite state automata. 2.4 Semantic Analysis ASdeCopas 2 (Coheur and Mamede, 2003) is the module responsible for the syntax-semantic interface. Ideas, formalisms and data from the 5P paradigm (Bès, 1999; Bès and Hagège, November 2001; Hagège, 2000) are followed/used to reach ASdeCopas input: a text with the associated graph. ASdeCopas uses hierarchically organized, intrinsically independent rules, that can be applied in any order, allowing information to be added incrementally. Also, partial results can be naturally produced. A formal presentation of the subsumption relation between rules is presented in (Coheur and Mamede, 2003). 2 ASdeCopas stands for Análise Semântica depois de Completada a análise sintáctica.
3 3 A practical application This section starts by showing how to disambiguate the word onde (Portuguese word for where), which can be either a relative (onde rel) or an interrogative adverb (onde int). 3 This section also shows how to deal with linguistic variations. Finally, semantic rules dealing with interrogative elements are presented. 3.1 Dealing with ambiguity Traditionally, morphological analysis shoots the system with both hypotheses, and posterior processing modules must deal with them, before disambiguation occurs. Instead, JaVaLI! through SMorph introduces a single ambiguous category onde covering both situations (dealing with ambiguous values is also suggested in (Poesio, 1994)). Then, during the processing chain, different levels of information are reached, allowing new disambiguation possibilities. This strategy prevents the complexity of dealing with (at least) two different categories from the beginning. Notice that the disambiguation process cannot be done in a single magic moment: PAsMo resolves some ambiguity situations; then, after Su- SAna s analysis, information about chunks is produced and other cases can be resolved; finally, AlgAS tries to disambiguate the remaining cases. At the end, reaching ASdeCopas, two things can be done to the remaining ambiguous values: either we treat them statistically, or we start to consider both categories. The observation of the Edite corpus (see 4.1 for details) gave us the following heuristics to disambiguate the word onde: 4 H1 if onde starts the question, it is a onde int ex: Onde fica a serra mais alta de Portugal? 5 ex: Onde se situam os parques naturais das Montanhas? 6 3 In JaVaLI!, Categories are sets of attribute/value pairs, i.e., feature structures hierarchically organized (see (Bès, 1999) for details). For exposure reasons we use a unique label to identify these sets. 4 As we will see there is an order associated with these heuristics. 5 Where is the highest mountain in Portugal? 6 Where are the national parks in the mountains? H2 if onde ends the question, it is a onde int ex: O aqueduto das águas livres começa onde? 7 ex: O museu de Arte Antiga é onde? 8 H3 if onde is followed by the sequence é que, it is a onde int ex: Onde é que há coretos? 9 ex: Onde é que se toca música folclórica? 10 H4 if another interrogative element quem, quais,... was detected and onde is not coordinate with it, then it is a onde rel ex: Qual é o maior Lago de Trás-os-Montes onde se pode andar de barco à vela? 11 ex: Quais os lagos de Portugal onde posso praticar windsurf? 12 H5 if we have a exist-type question, it is a onde rel ex: Existem campos de golfe onde possa ter lições em Alemão? 13 ex: Existem parques de campismo no Gerês onde não seja necessário apresentar a carta de campista? 14 Finally, we present a last heuristic to disambiguate the word onde. It works fine in the observed corpus, but situations exist, where it won t apply. As so, this heuristic is presented as a last choice and it should be applied only if the previous heuristics had no success: H6 if before the occurrence of onde, there are only prepositional phrases, it is a onde int ex: Em Aveiro onde posso comprar peças de artesanato? 15 7 Where does the águas livres aqueduct begins? 8 The Museum of Ancient Art, is where? 9 Where can one find a bandstand? 10 Where is folkloric music played? 11 Which is the biggest lake of Trás-os-Montes where one can sail? 12 In which Portuguese lakes one can practice Windsurf? 13 Is there any golf club where I can take lessons in German? 14 Is there any camping park in Gerês where camping card is not necessary? 15 Where can I buy handicrafts in Aveiro?
4 ex: Em Setúbal onde posso jogar Andebol? 16 Following the process chain, we have that: PAsMo is able to apply H1, H2 and H3; After SuSAna s analysis, H4, H5 and H6 can be applied. Disambiguation results are shown in section Linguistic variations In this section we show how we deal with linguistic variations. Notice that we are not saying that we solve paraphrases. We limit ourselves in applying special treatments to particular sets of words, that we want to make converge to a certain formula. Also, once again there is no gold moment to apply these treatments. For example, porquê que and porque razão are two ways of asking the same thing: why? PAsMo takes care of this constructions, grouping these elements together and transforming them into a single porque. However, this is not so simple: sometimes more or less complex sequences of words can occur between the elements that we want to group. This is the case of sequences as Como se chama... 17, where a prepositional phrase can occur between como and se chama (ex: Como, em Lisboa, se chama..) 18. In these situations, only after detecting prepositional phrases that is, after SuSAna s analysis these elements can be grouped. The arrowing concept also helps to handle linguistic variations as the same arrowing structures can capture several different syntactic structures. That is, different syntactic structures can be captured by the same arrowing relations, allowing to reach the same formulas. For example: Onde fica o hotel Ritz?, O hotel Ritz fica onde?, O hotel Ritz onde fica? and Fica onde o hotel Ritz, they are all captured by the same arrowing relations. 19 To conclude this section, the semantic rules itself can also ease the convergence process. For 16 Where can I play handball in Setúbal? 17 What is the name... 18?What in Lisbon is the name Where is hotel Ritz? example, verbs existir and haver 20, in the forms existe(m) and há, respectively, may have different values. They may introduce an exist-question 21 ex: Há alojamentos no Gerês? 22 ex: Existem praias com água quente na Costa Verde? 23 They can mean to have ex: Em que praias do Alentejo há nadador salvador? 24 ex: Em que praias do Alentejo existem nadadores salvadores? 25 They can appear in contexts in which they don t have any relevant semantic value ex: Que casas do século 15 existem em Sintra? 26 ex: Onde que há coretos? 27 By analyzing the involving context, we are able to decide which of the above situation we are dealing with. With this information we are able to converge há and existe into the same formal representation in the context of exist-questions. In addition, we can treat them as the word com (with) in the second situation. Finally, we can ignore them in the last case. 3.3 Semantic rules In this section we briefly describe the semantic rules responsible for detecting the question type where, when, what, exist-question... and its target an hotel, an event,... We start with a rule for the interrogative pronouns que 28, qual, quais and quem. They have in common the fact that they all arrow a name 29, and 20 Roughly, to exist and to have. 21 Does X exists? 22 Is there any accommodation in Gers? 23 In there any beach with warm water in Costa Verde? 24 Which are the beaches in Alentejo having lifeguard? 25 Which are the beaches in Alentejo having lifeguard? 26 Which houses from the 15th century exist in Sintra? 27 Where can one find bandstands? 28 It has a behavior similar of the onde. 29 As Quais são os hotéis... is equivalent to Quais os hotéis..., we ignore the word são, and Quais arrows the noun hotéis in both cases.
5 they share the same semantic representation 30. For example, both and and... qual a montanha que montanha translates into?x 166, montanha(x 166 ) In the same way, both... qual é o responsável quem é o responsável translates into?x 656, responsável(x 656 ) We also have a rule for onde int, that can arrow either a noun or a verb: Onde são os melhores hotéis 89 do Algarve? 31 Onde é que se pode 124 nadar na Costa Verde? 32 and The obtained representation is?local(x 89 )?local(x 124 ) respectively. In the first situation we search the localization of the element identified by variable x 89, and in the second situation, a place where we can do something (in this case swimming). 30 We put some order in the variables generation: the position of the word in the text is the associated variable index. 31 Where are located the best hotels in Algarve. 32 Where can one swim in the Green Coast?. In addition, we have rules for quanto 33, and for the family of the elements questioning time, such as A que horas, Em que dias, Em que anos. 34 Notice that these sequences where previously grouped by PAsMo. Finally, we have rules that identify existquestions, depending on the position occupied by the forms of the verbs existir and haver. Before ending this section let us take a look at an example generated by JaVaLI!. Being given Em que praias do Alentejo, com bandeira azul, há nadador salvador? 35 the following formula is obtained: R20:?x145 R1: praias(x145) R17: de(x145, x148) R23: NAME(x148, Alentejo) R17: com(x145, x151) R1: bandeira(x151) R9: AM(x151, x152) id(x152)=azul R35: com(x145, x155) R1: nadador-salvador(x155) Each R i identifies the applied rule. The predicate NAME associates a variable (representing an entity) with its name (as in (Allen, 1995)), and AM associates a variable (also representing an entity) with an adjective modifying it (as in (Jurafsky and Martin, 2000)). Binary predicates came from prepositions, except com(x145, x155). This is originated because in this particular syntactic context há means com (with), as stated in section Evaluation 4.1 Corpus Edite The Edite corpus was collected after the Edite s project (da Silva, 1997; Reis et al., 1997), containing 680 questions about 68 tourist resources from hotels to restaurants, golf fields, etc. It was 33 How much. 34 Respectively, What time, In which days, In which year. 35 which are the beaches in Alentejo, with a blue flag and having life-guard?.
6 built by a group of ten people, being each one responsible for 68 questions concerning each tourist resource. Edite is not a corpus naturally built, i.e., it was not created by tourists in a real situation of information request. Nevertheless, it was a starting point for the presented translating system. In this way, whenever was important to distinguish semantic behavior about a given element or group of elements, we used this corpus. In fact, this corpus was the basis for the observations that allowed us to determine if there were different syntactic contexts associated to those different semantic behaviors. 4.2 Onde disambiguation results The Edite corpus contains 122 occurrences of the word onde, distributed as follows: 72 occurrences at the beginning of the question that is, applying H1 allows to obtain 72 onde int; 2 occurrences at the end of the question as so, applying H2 allows to obtain 2 additional onde int; 3 occurrences of onde (that do not occur at the beginning or at the end of the question) are followed by é que meaning that we have more 3 onde int; Moreover, H4 and H5 allows to detect 37 onde rel; H6 detects 7 additional onde int. Therefore, we were able to detect (correctly) the category of the word onde in 121 of the 122 cases. The remaining case is Igreja Nossa Senhora do Rosário, onde fica? 36 Enriching H2 that is, adding that onde followed by fica(m), encontra(m) or é (são) in the last position of the statement is a onde int allows to disambiguate the remaining case. It should be noticed that the order of application of these heuristics is relevant. Consider, for 36 Nossa Senhora do Rosrio church, where is it? example, the sentence Onde é que há lagos onde possa fazer ski aquático? 37 The first onde is disambiguated by H1. Only the fact that category onde int is given to this onde, allows H3 to be applied and to disambiguate the second onde. 4.3 Identifying what is asked 100 questions were extracted from Edite s corpus and applied to JaVaLI! chain. Then, these sentences were manually treated, assuming correct arrowing relations. ASdeCopas was then applied and able to correctly identify 95% of what was asked. That is, in 95% of the cases, ASdeCopas was able to identify if we were asking for a location of an hotel, for the existence of a lake, for the possibility of visiting a museum, and so on. The remaining cases result from the incomplete treatment of the interrogative adverb como (ex: Como que se pode ir para o Castelo de Palmela? 38 ). A final note goes to the possibility of evaluating ASdeCopas performance incrementally: as rules are independent from each other, one can evaluate the results produced by a given set of rules for example, rules regarding interrogative particles without checking if the system already has rules for verbs, adverbs, etc. 5 Conclusions We presented JaVaLI!, a system that tries to translate unrestricted text into a formal language. Special focus was given to linguistic clues that can help to disambiguate words and also to make semantically equivalent sequences converge into the same formula. We showed some preliminary results of applying JaVaLI! to a corpus of 680 Portuguese questions. Hypothetically, JaVaLI! could be applied to the semantic web as a translator, trying to transform existing HTML sources, for example, into XML/RDF. However, until now, we made no effort in that direction. In fact, more promising is to use JaVaLI! as a translator of NL queries into 37 Where are the lakes where one can ski? 38 How can one go to Palmela s Castle?
7 a semantic web query language as for example TRIPLE (Sintek and Decker, 2001). References Steven Abney Chunks and Dependendencies: Bringing Processing Evidence to Bear on Syntax. CSLI. Salah Aït-Mokhtar L analyse Présyntaxique en une seule étape. Ph.D. thesis, Université Blaise Pascal, Feb. James Allen Natural Language Understanding (second edition). The Benjamin Cummings Publishing Company, Inc. Fernando Batista and Nuno Mamede SuSAna: Módulo multifuncional da análise sintáctica de superfície. In Julio Gonzalo, Anselmo Peñas, and Antonio Ferrández, editors, Proc. Multilingual Information Access and Natural Language Processing Workshop (IBERAMIA 2002), pages 29 37, Sevilla, Spain, November. Gabriel Bès and Caroline Hagège. November, Properties in 5P (soon in the GRIL pages). Gabriel G. Bès La phrase verbal noyau en français. In in Recherches sur le français parlé, 15, pages Université de Provence, France. Luísa Coheur and Nuno Mamede ASdeCopas: a syntactic-semantic interface. paper submited to EPIA Luísa Coheur AlgAS, um algoritmo de préanálise semântica. Technical Report RT/008/02- CDIL, L 2 F-Laboratório de Sistemas de Língua Falada, Inesc-id, Lisboa, Portugal, Maro. Boris Katz and Jimmy Lin Rextor: A system for generating relations from natural language. In Proceedings of the ACL 2000 Workshop of Natural Language Processing and Information Retrieval (NLP&IR), October. Boris Katz, Deniz Yuret, Jimmy Lin, Sue Felshin, Rebecca Schulman, and Adnan Ilik Blitz: A preprocessor for detecting context-independent linguistic structures. In Proceedings of the 5th Pacific Rim Conference on Artificial Intelligence (PRICAI 98), November. David Matos, Joana Paulo, and Nuno Mamede Managing linguistic resources and tools. In 6th PROPOR Workshop, Faro, Portugal, June. David Milward Towards a robust semantics for dialogue using flat structures. In Proceedings of Amstelogue. Joana Lúcio Paulo PAsMo - pós-análise morfológica. Relatório técnico, Instituto Superior Técnico, Lisboa. M. Poesio Ambiguity, Underspecification and Discourse Interpretation. ITK, Tilburg University. Paulo Reis, J. Matias, and Nuno Mamede Edite - a natural language interface to databases: a new dimension for an old approach. In Proceeding of the Fourth International Conference on Information and Communication Technology in Tourism (EN- TER 97), Edinburgh, Escócia. Springer-Verlag. Michael Sintek and Stefan Decker Triple a query language for the semantic web. In ICEC 2001, Workshop on Semantic Web-based E-Commerce and Rules Markup Languages, November. Luísa Marques da Silva Edite, um sistema de acesso a base de dados em linguagem natural, análise morfológica, sintáctica e semântica (master thesis). Master s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal. Abbaci Faiza Développement du module Post- SMorph. Master s thesis, Mémoire de DEA de linguistique et informatique, GRIL, Université Blaise Pascal, Clermont-Ferrand. Caroline Hagège Analyse Syntatic Automatique du Portugais. Ph.D. thesis, Université Blaise Pascal, Clermont-Ferrand, France. Daniel Jurafsky and James Martin, Speech and Language Processing, chapter 15. Prentice Hall.
Towards a flexible syntax/semantics interface
Towards a flexible syntax/semantics interface Luísa Coheur 1, Fernando Batista 2, and Nuno J. Mamede 3 Spoken Language Systems Lab Rua Alves Redol 9, 1000-029 Lisboa, Portugal 1 L 2 F INESC-ID Lisboa/IST/GRIL,
Terminology mining with ATA and Galinha
Terminology mining with ATA and Galinha Joana L. Paulo, David M. de Matos, and Nuno J. Mamede L 2 F Spoken Language System Laboratory INESC-ID Lisboa/IST, Rua Alves Redol 9, 1000-029 Lisboa, Portugal http://www.l2f.inesc-id.pt/
Natural Language Interfaces to Databases: simple tips towards usability
Natural Language Interfaces to Databases: simple tips towards usability Luísa Coheur, Ana Guimarães, Nuno Mamede L 2 F/INESC-ID Lisboa Rua Alves Redol, 9, 1000-029 Lisboa, Portugal {lcoheur,arog,nuno.mamede}@l2f.inesc-id.pt
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
How to integrate data from different sources
How to integrate data from different sources Ricardo Ribeiro, David M. de Matos, Nuno J. Mamede L 2 F Spoken Language Systems Laboratory INESC ID Lisboa Rua Alves Redol 9, 1000-029 Lisboa, Portugal {ricardo.ribeiro,david.matos,nuno.mamede}@l2f.inesc-id.pt
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software
Semantic Research using Natural Language Processing at Scale; A continued look behind the scenes of Semantic Insights Research Assistant and Research Librarian Presented to The Federal Big Data Working
[email protected] IST/INESC-ID. http://fenix.tecnico.ulisboa.pt/homepage/ist14264 R. Alves Redol 9 Sala 132 1000-029 Lisboa PORTUGAL
Sérgio Miguel Fernandes [email protected] IST/INESC-ID http://fenix.tecnico.ulisboa.pt/homepage/ist14264 R. Alves Redol 9 Sala 132 1000-029 Lisboa PORTUGAL Curriculum Vitae Personal Data
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
How To Train At Lisboa E Benfica
ELITE CAMPS EXPERTISE IN FOOTBALL - SUMMER 2014 SPORT LISBOA E BENFICA Founded on February 28th, 1904, Sport Lisboa e Benfica is the Biggest Club in the World. With 14 million fans around the world, it
COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE
SPAN 100/101 ELEMENTARY SPANISH COURSE OBJECTIVES This Spanish course pays equal attention to developing all four language skills (listening, speaking, reading, and writing), with a special emphasis on
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
Value of IEEE s Online Collections
Value of IEEE s Online Collections Judy H. Brady, IEEE Aveiro, Portugal February 2013 About the IEEE A not-for-profit society World s largest technical membership association with over 400,000 members
Portuguese as Foreign Language Level A.1
Portuguese as Foreign Language Level A.1 SYLLABUS NON-DEGREE GRANTING 1. CURRICULAR UNIT Portuguese as Foreign Language Level A.1 2. MAIN SCIENTIFIC AREA, according to Portaria nº 256/2005, of March 16
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Prosodic Phrasing: Machine and Human Evaluation
Prosodic Phrasing: Machine and Human Evaluation M. Céu Viana*, Luís C. Oliveira**, Ana I. Mata***, *CLUL, **INESC-ID/IST, ***FLUL/CLUL Rua Alves Redol 9, 1000 Lisboa, Portugal [email protected], [email protected],
AIR CARGO CHALLENGE 09. Organized by Aeroubi EUROAVIA Covilhã
AIR CARGO CHALLENGE 09 Organized by Aeroubi EUROAVIA Covilhã Welcome! Dear ACC 09 Participants, We are very happy to have you as a participant of Air Cargo Challenge 09 held by Aeroubi-EUROAVIA- Covilhã
Europass Curriculum Vitae
Europass Curriculum Vitae Personal information First name(s) / Surname(s) Address Rua Direita nº36, Penedo, 155-3460 Lageosa do Dão - Tondela Mobile +351 916244743 E-mail(s) [email protected];
The XLDB Group at CLEF 2004
The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql
Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872
ELITE TRAINING CAMPS
CAMPS 2013 ELITE CAMPS EXPERTISE IN FOOTBALL - SUMMER 2013 CAMPS 2013 SPORT LISBOA E BENFICA Founded on February 28th, 1904, Sport Lisboa e Benfica is the Biggest Club in the World. With 14 million fans
CURRICULUM VITAE FERNANDO LUÍS TODO-BOM FERREIRA DA COSTA
CURRICULUM VITAE FERNANDO LUÍS TODO-BOM FERREIRA DA COSTA Full Name: Fernando Luís Todo-Bom Ferreira da Costa Living Address: R. Tomás da Fonseca 36, 7-B, 1600-275 Lisboa Cell Phone: 91 4426281 E-mail
Short Curriculum Vitæ
Short Curriculum Vitæ July 2012 Personal details Name: António Maria Lobo César Alarcão Ravara Date and Place of birth: April 6, 1968, Lisbon, Portugal Nationality: Portuguese Affiliation: Research Center
Classification of Natural Language Interfaces to Databases based on the Architectures
Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural
THINK SUCCESS MAKE IT HAPPEN ANNA NOT MISSING HER ENGLISH CLASS. myclass AN ENGLISH COURSE THAT FITS YOUR LIFE
THINK SUCCESS MAKE IT HAPPEN ANNA NOT MISSING HER ENGLISH CLASS myclass AN ENGLISH COURSE THAT FITS YOUR LIFE Porquê myclass Why myclass? A importância do Inglês é fundamental tanto na construção da sua
D2.4: Two trained semantic decoders for the Appointment Scheduling task
D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
Guest Reputation Indexes to Analyze the Hotel s Online Reputation Using Data Extracted from OTAs
Guest Reputation Indexes to Analyze the Hotel s Online Reputation Using Data Extracted from OTAs RUI CHOUPINA, MARISOL B. CORREIA, CÉLIA M.Q. RAMOS Escola Superior de Gestão, Hotelaria e Turismo University
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Europass Curriculum Vitae
Europass Curriculum Vitae Personal information Surname(s) / First name(s) Ventura, Artur David Felix Address(es) Rua Visconde de Santarem, N o 4 5-esq, 1000 Lisboa Telephone(s) +351 91 967 39 16 Email(s)
How To Build A Portuguese Web Search Engine
The Case for a Portuguese Web Search Engine Mário J. Gaspar da Silva FCUL/DI and LASIGE/XLDB [email protected] Community Web Community webs have distint social patterns (Gibson, 1998) Community webs can
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Ricardo Dias. Contacts. About me. (Born August 28th 1986, Portugal)
Ricardo Dias (Born August 28th 1986, Portugal) Contacts URL: http://web.ist.utl.pt/ricardo.dias Email: [email protected] Alternative Email: [email protected] Twitter: http://twitter.com/rspdias
Towards Unsupervised Word Error Correction in Textual Big Data
Towards Unsupervised Word Error Correction in Textual Big Data Joao Paulo Carvalho 1 and Sérgio Curto 1 1 INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, Lisboa, Portugal
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition
The Book of Grammar Lesson Six Mr. McBride AP Language and Composition Table of Contents Lesson One: Prepositions and Prepositional Phrases Lesson Two: The Function of Nouns in a Sentence Lesson Three:
Overview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
Europass Curriculum Vitae
Europass Curriculum Vitae Personal information Surname(s) / First name(s) Address(es) Custódio, Jorge Filipe Telephone(s) +351 919687707 Email(s) Personal website(s) Nationality(-ies) Rua Francisco Pereira
ANALEC: a New Tool for the Dynamic Annotation of Textual Data
ANALEC: a New Tool for the Dynamic Annotation of Textual Data Frédéric Landragin, Thierry Poibeau and Bernard Victorri LATTICE-CNRS École Normale Supérieure & Université Paris 3-Sorbonne Nouvelle 1 rue
E UROPEAN PERSONAL INFORMATION JOÃO ROMÃO ACADEMIC ACTIVITIES
E UROPEAN CURRICULUM VITAE FORMAT PERSONAL INFORMATION Name E-mail Nationality JOÃO ROMÃO [email protected] Portuguese Date of birth 12-05-1968 ACADEMIC ACTIVITIES Since April 2014 Universidade do Algarve
Ling 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
No Evidence. 8.9 f X
Section I. Correlation with the 2010 English Standards of Learning and Curriculum Framework- Grade 8 Writing Summary Adequate Rating Limited No Evidence Section I. Correlation with the 2010 English Standards
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy
Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations
How the decision making process is made in Radiology: a TA C lapproach
III PhD Conference on Technology Assessment How the decision making process is made in Radiology: a TA C lapproach i c k to e d i t c o m p a n y s l o g a n. Maria João Maia FCT / UNL Superviosor: Prof.
Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database
I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries
Luís Carlos dos Santos Marujo
Luís Carlos dos Santos Marujo Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 [email protected] [email protected] Education
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
Practice Profile. Rui Castro Practice Profile 1
Practice Profile Rui Castro Practice Profile 1 Table of contents 1. Presentation 2. Curriculum Curriculum vitae 3. Works The practice work Rui Castro Practice Profile 2 1. Presentation Rui Castro is an
Text-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
13 melhores extensões Magento melhorar o SEO da sua loja
Lojas Online ou Lojas Virtuais Seleção das melhores lojas para comprar online em Portugal. Loja virtual designa uma página na Internet com um software de gerenciamento de pedidos (carrinho de compras)
How to Get to Lisbon Experience Apartments Príncipe Real
How to Get to Lisbon Experience Apartments Príncipe Real Page 1 Adress: Rua da Palmeira 33, 1200 Lisboa (Next to Príncipe Real Square Garden) 1 Getting to the Apartment from the Airport. page 2 2 Getting
The University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
Semantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
12 FIRST QUARTER. Class Assignments
August 7- Go over senior dates. Go over school rules. 12 FIRST QUARTER Class Assignments August 8- Overview of the course. Go over class syllabus. Handout textbooks. August 11- Part 2 Chapter 1 Parts of
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
O que é WinRDBI O WinRDBI (Windows Relational DataBase Interpreter) é uma ferramenta educacional utilizada pela Universidade do Estado do Arizona, e que fornece uma abordagem ativa para entender as capacidades
Portuguese Research Institutions in History
José Luís Cardoso Technical University of Lisbon [email protected] Luís Adão da Fonseca University of Porto [email protected] The growth in historical research in Portugal over the last two decades
CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage
CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing
A Machine Translation System Between a Pair of Closely Related Languages
A Machine Translation System Between a Pair of Closely Related Languages Kemal Altintas 1,3 1 Dept. of Computer Engineering Bilkent University Ankara, Turkey email:[email protected] Abstract Machine translation
A Chart Parsing implementation in Answer Set Programming
A Chart Parsing implementation in Answer Set Programming Ismael Sandoval Cervantes Ingenieria en Sistemas Computacionales ITESM, Campus Guadalajara [email protected] Rogelio Dávila Pérez Departamento de
Portugal. Sub-Programme / Key Activity. HOSPITAL GARCIA DE ORTA PAR 2007 ERASMUS Erasmus Network European Pathology Assessment & Learning System
HOSPITAL GARCIA DE ORTA PAR 2007 ERASMUS Erasmus Network European Pathology Assessment & Learning System UNIVERSIDADE DE AVEIRO PAR 2007 ERASMUS BeFlex Plus: Progress on Flexibility in the Bologna Reform
Curso académico 2015/2016 INFORMACIÓN GENERAL ESTRUCTURA Y CONTENIDOS HABILIDADES: INGLÉS
Curso académico 2015/2016 INFORMACIÓN GENERAL ESTRUCTURA Y CONTENIDOS HABILIDADES: INGLÉS Objetivos de Habilidades: inglés El objetivo de la prueba es comprobar que el alumno ha adquirido un nivel B1 (dentro
Knowledge Discovery from Data Bases Proposal for a MAP-I UC
Knowledge Discovery from Data Bases Proposal for a MAP-I UC P. Brazdil 1, João Gama 1, P. Azevedo 2 1 Universidade do Porto; 2 Universidade do Minho; 1 Knowledge Discovery from Data Bases We are deluged
ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Tel +351 213 628 863 Fax +351 213 645 818 [email protected] www.lgls.pt Lisbon Portugal
Tel +351 213 628 863 Fax +351 213 645 818 [email protected] www.lgls.pt Lisbon Portugal The architectural office LGLS was founded in 1999 by its current three managing partners two of which have been teaching
The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle
SCIENTIFIC PAPERS, UNIVERSITY OF LATVIA, 2010. Vol. 757 COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES 11 22 P. The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle Armands Šlihte Faculty
Processing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
Conceptual Schema Approach to Natural Language Database Access
Conceptual Schema Approach to Natural Language Database Access In-Su Kang, Seung-Hoon Na, Jong-Hyeok Lee Div. of Electrical and Computer Engineering Pohang University of Science and Technology (POSTECH)
M LTO Multilingual On-Line Translation
O non multa, sed multum M LTO Multilingual On-Line Translation MOLTO Consortium FP7-247914 Project summary MOLTO s goal is to develop a set of tools for translating texts between multiple languages in
Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier
Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every
Livingston Public Schools Scope and Sequence K 6 Grammar and Mechanics
Grade and Unit Timeframe Grammar Mechanics K Unit 1 6 weeks Oral grammar naming words K Unit 2 6 weeks Oral grammar Capitalization of a Name action words K Unit 3 6 weeks Oral grammar sentences Sentence
José M. F. Moura, Director of ICTI at Carnegie Mellon Carnegie Mellon Victor Barroso, Director of ICTI in Portugal www.cmu.
José M. F. Moura, Director of ICTI at Victor Barroso, Director of ICTI in Portugal www.cmu.edu/portugal Portugal program timeline 2005: Discussions and meeting with Ministry of Science Technology, Higher
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Reengineering a domain-independent framework for Spoken Dialogue Systems
Reengineering a domain-independent framework for Spoken Dialogue Systems Filipe M. Martins, Ana Mendes, Márcio Viveiros, Joana Paulo Pardal, Pedro Arez, Nuno J. Mamede and João Paulo Neto Spoken Language
Embedding Defeasible Argumentation in the Semantic Web: an ontology-based approach
Embedding Defeasible Argumentation in the Semantic Web: an ontology-based approach Sergio Alejandro Gómez 1 Carlos Iván Chesñevar Guillermo Ricardo Simari 1 Laboratorio de Investigación y Desarrollo en
