The University of Amsterdam at 2005
|
|
|
- Alban Porter
- 10 years ago
- Views:
Transcription
1 The University of Amsterdam at 2005 David Ahn Valentin Jijkoun Karin Müller Maarten de Rijke Erik Tjong Kim Sang Informatics Institute, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract We describe the official runs of our team for the CLEF 2005 question answering track. We took part in the monolingual Dutch task, and focused most of our efforts on refining our existing multi-stream architecture, and porting parts of it to an XMLbased platform. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Management]: Languages Query Languages General Terms Measurement, Performance, Experimentation Keywords Question answering, Questions beyond factoids 1 Introduction In previous participations in question answering tracks at both CLEF and TREC we developed a multi-stream question answering architecture which implements multiple ways of identifying candidate answers, complemented with elaborate filtering mechanisms to weed out incorrect candidate answers [1, 2, 8, 9]. Part of our efforts for the 2005 edition of the QA@CLEF track were aimed at improving this architecture, in particular the so-called table stream (see Subsection 2.2). Also, to accommodate the new questions with temporal restrictions, a dedicated module was developed. The bulk of our efforts, however, was aimed at porting one of the streams to a pure QA-as-XMLretrieval setting, where the target collection is automatically annotated with linguistic information at indexing time, incoming questions are converted to semistructured queries, and evaluation of these queries gives a ranked list of candidate answers. Our main findings this year are the following. While our system provides wrong answers for less than 40% of the test questions, we identified some obvious areas for improvement. First, we should work on definition extraction so that both questions asking for definitions and questions requiring resolving definitions can be answered in a better way. Second, we should examine inheritance of document links in the answer tiling process to make sure that the associated module does not cause unnecessary unsupported answers. And third, and most importantly, we should improve our answer filtering module to make sure that the semantic class of the generated answer corresponds with the class required by the question.
2 lookup XQuesta NL pattern match ngrams Dutch CLEF corpus question pattern match ngrams Web candidate answers filtering tiling type checking answer question classification Dutch Wikipedia justification Figure 1: Quartz-2005: the University of Amsterdam s Dutch Question Answering System. The paper is organized as follows. In Section 2, we describe the architecture of our QA system. In Section 3, we describe the new XQuesta stream, i.e., the stream that implements QA-as-XMLretrieval. In Section 4, we detail our official runs. In Section 5, we discuss the results we have obtained and give a preliminary analysis of the performance of different components of the system. We conclude in Section 6. 2 System Overview 2.1 Architecture Many QA systems share the following pipeline architecture. A question is first associated with a question type, from a predefined set such as Date-of-birth or Currency. Then, a query is formulated based on the question, and an information retrieval engine is used to identify a list of documents that are likely to contain the answer. Those documents are sent to an answer extraction module, which identifies candidate answers, ranks them, and selects the final answer. On top of this basic architecture, numerous add-ons have been devised, ranging from logic-based methods [12] to ones that rely heavily on the redundancy of information available on the World Wide Web [4]. Essentially, our system architecture implements multiple copies of the standard architecture, each of which is a complete standalone QA system that produces ranked answers, though not necessarily for all types of questions. The overall system s answer is then selected from the combined pool of candidates through a combination of merging and filtering techniques. For a reasonably detailed discussion of our QA system architecture we refer to [2]. A general overview of the system is given in Figure 1. Question Processing. The first stage of processing, question processing, is common to all the streams. Each of the 200 questions is tagged, parsed, and assigned a question class based on our question classification module. Finally, the expected answer type is determined. See Section 3.2 for more details about question processing for the XQuesta stream, in particular. There are seven streams in our system this year, four of which use the CLEF corpus to answer questions and three of which use external sources of information. We now provide a brief description of these seven streams. Note that except for the table stream and XQuesta, which we discuss in Sections 2.2 and 3 respectively, these streams remain unchanged from our system for last year s evaluation [2]. Streams that consult the Dutch CLEF corpus. Four streams generate candidate answers from the Dutch CLEF corpus: Table Lookup, Pattern Match, Ngrams, and XQuesta. The XQuesta stream, which is based on the idea of QA-as-XML-retrieval, is completely new for this year s system; for more details about it, see Section 3. The Table Lookup stream uses specialized knowledge bases constructed by preprocessing the collection, exploiting the fact that certain types of information (such as country capitals, abbrevi-
3 ations, and names of political leaders) tend to occur in the document collection in a small number of fixed patterns. When a question type indicates that the question might potentially have an answer in these tables, a lookup is performed in the appropriate knowledge base and answers which are found there are assigned high confidence. For a detailed overview of this stream, see [7]; for information about improvements made to this stream for this year, see Section 2.2. In the Pattern Match stream, zero or more regular expressions are generated for each question according to its type and structure. These patterns, which indicate strings that are highly likely to contain the answer, are then matched against the entire document collection. The Ngram stream, similar in spirit to [5], constructs a weighted list of queries for each question using a shallow reformulation process, similar to the Pattern Match stream. These queries are fed to a retrieval engine (we use the open-source Lucene, with a standard vector-space model [11]), and the top retrieved documents are used for harvesting word n-grams. The n-grams are ranked according to the weight of the query that generated them, their frequency, NE type, the proximity to the query keywords and more parameters; the top-ranking n-grams are considered candidate answers. Streams that consult the Web. Quartz has two streams that attempt to locate answers on the web: Ngram mining and Pattern Match. For Web searches, we use Google, and n-grams are harvested from the returned snippets. Pattern matching, by contrast, is done against full documents returned by Google. Wikipedia stream. This stream also uses an external corpus the Dutch Wikipedia (http: //nl.wikipedia.org), an open-content encyclopedia in Dutch (and other languages). However, since this corpus is much cleaner than news paper text, the stream operates in a different manner. First, the focus of the question is identified; this is usually the main named entity in the question. Then, this entity s encyclopedia entry is looked up; since Wikipedia is standardized to a large extent, this information has a template-like nature. Finally, using knowledge about the templates used in Wikipedia, information such as Date-of-death and First-name can easily be extracted. After the streams have produced candidate answers, these answers must be processed in order to choose a single answer to return. Also, for those streams that do not extract answers directly from documents in the CLEF collection, documents in the corpus that support or justify their answers must be found. The modules that perform these tasks are described next. Answer Justification. As some of our streams obtain candidate answers outside the Dutch CLEF corpus, and as answers need to be supported, or justified, by a document in the Dutch CLEF corpus, we need to find justification for externally found candidate answers. To this end we construct a query with keywords from a given question and candidate answer, and consider the top-ranking document for this query to be the justification, using an Okapi model, as this tends to do well on early high precision in our experience. Additionally, we use some retrieval heuristics such as marking the answer terms in the query as boolean (requiring them to appear in retrieved documents). Note: in addition to answers from the three streams that use outside data sources, answers from the corpus-based Ngram stream also need to be justified, since these answers are selected by the stream on the basis of support from multiple documents. Filtering, Tiling, and Type Checking. We use a final answer selection module (similar to that described in [6]) with heuristic candidate answer filtering and merging, and with stream voting. To compensate for named entity errors made during answer extraction, our type checking module (see [14] for details) uses several geographical knowledge bases to remove candidates of incorrect type for location questions. Furthermore, we take advantage of the temporally restricted questions in this year s evaluation to re-rank candidate answers for such questions using temporal information; see Section 2.3 for more details. 2.2 Improvements to table stream To the tables used in 2004, we added a table which contained definitions extracted (offline) with two rules: one to extract definitions from appositions and another to create definitions by combining
4 proper nouns with preceding common nouns. This table was used in parallel with the existing roles table, which contained definitions only for people. The new table contained more than three times as many entries (611,077) as the existing one. In contrast to earlier versions of the table module, all tables are now stored in SQL format and made available in a MySQL database. The type of an incoming question is converted to sets of tuples containing three elements: table, source field and target field. The table code will search in the source field of the specified table for a pattern and, in cases where a match is found, keep the contents of corresponding target field as a candidate answer. Ideally the search pattern would be computed by the question analysis module but currently we use separate code for this task. The table fields only contain noun phrases that are present in the text. This means that they can be used for answering questions such as: (1) Wie is de president van Servië? Who is the President of Serbia? because the phrase president van Servië can normally be found in the text. However, in general, this stream cannot be used for answering questions such as: (2) Wie was de president van Servië in 1999? Who was the President of Serbia in 1999? because the modifier in 1999 does not necessarily always follow the profession. The table code contains backoff strategies (case insensitive vs. case sensitive, exact vs. inexact match) in case a search returns no matches. 2.3 Temporal restrictions Twenty-six questions in this year s QA track are tagged as temporally restricted. As the name implies, such questions ask for information relevant to a particular time; the time in question may be given explicitly by a temporal expression (or timex), as in: (3) Q0094: Welke voetballer ontving De Gouden Bal in 1995? Which footballer won the European Footballer of the Year award in 1995? or it may be given implicitly, with respect to another event, as in: (4) Q0160: Hoe oud was Nick Leeson toen hij tot gevangenisstraf werd veroordeeld? How old was Nick Leeson when he was sentenced to prison? Furthermore, the temporal restriction may not be to a point in time but to a temporal interval, as in: (5) Q0156: Hoeveel medicinale produkten met tijgeringredienten gingen er tussen 1990 en 1992 over de toonbank? How many medicinal products containing tiger ingredients were sold between 1990 and 1992? (6) Q0008: Wie speelde de rol van Superman voordat hij verlamd werd? Who played the role of Superman before he was paralyzed? In our runs this year, we took advantage of these temporal restrictions to re-rank candidate answers for temporally restricted questions. Because we had already built a module to identify and normalize timexes (see Section 3.1), we limited ourselves to explicit temporal restrictions (i.e., those signalled by timexes). Handling event-based restrictions would require identifying (and possibly temporally locating) events, which is a much more difficult problem. For each temporally restricted question, the temporal re-ranker tries to identify the temporal restriction by looking for temporal prepositions (such as in, op, tijdens, voor, na) and timexes
5 in the question. If it succeeds in identifying an explicit temporal restriction, it proceeds with re-ranking the candidate answers. For each candidate answer, the justification document is retrieved, and sentences containing the answer and, if there is one, the question focus are extracted from it. If there are any timexes in these sentences, the re-ranker checks whether these timexes are compatible with the restriction. For each compatible timex, the score for the candidate answer is boosted; for each incompatible timex, the score is lowered. The timestamp of the document is also checked for compatibility with the restriction, and the score is adjusted accordingly. The logic involved in checking compatibility of a timex with a temporal restriction is relatively straightforward; the only complications come in handling times of differing granularities. 3 XQuesta The XQuesta stream implements a QA-as-XML-retrieval approach [10, 13]. The target collection is automatically annotated with linguistic information offline. Then, incoming questions are converted to semistructured queries, and evaluation of these queries gives a ranked list of candidate answers. We describe the three stages in detail. 3.1 Offline annotation We automatically processed the Dutch QA collection, identifying sentences and annotating them syntactically and semantically. For processing time reasons, we switched from the full parses used in 2004 to a Dutch part-of-speech tagger and a text chunker. Both were trained on the CGN corpus [15] in the same way as the tagger and chunker described in [17] but using the TnT tagger [3] rather than a general machine learner. Two extra layers of annotation were added: named entities, by the TnT tagger trained on CoNLL-2002 data [16], and extra temporal expressions, by a hand-coded rule-based system, because these were not included in the previously mentioned training data. The temporal expression annotation system not only identified temporal expressions but also, where possible, normalized them to a standard format (ISO 8601). Here are some examples of the annotations. Example 7 shows a tokenized sentence in which each token has received an abbreviated Dutch part-of-speech tag. (7) <LID>de</LID> <ADJ>machtige</ADJ> <N>burgemeester</N> <VZ>van</VZ> <N>Moskou</N> <LET>,</LET> <N>Joeri</N> <N>Loezjkov</N> <LET>,</LET> <WW>veroordeelt</WW> <VNW>dat</VNW> the powerful mayor of Moscow, Joeri Loezjkov, condemns that In Example 8, the sentence is divided into syntactic chunks. The three most frequently occurring chunk types are noun phrases (NP), verb phrases (VP) and prepositional phrases (PP). Some tokens, nearly always the punctuation signs, are not part of a chunk. By definition, chunks cannot be embedded. (8) <NP>de machtige burgemeester</np> <PP>van</PP> <NP>Moskou</NP>, <NP>Joeri Loezjkov</NP>, <VP>veroordeelt</VP> <NP>dat</NP> Example 9 shows the named entities in the text. These are similar to text chunks: they can contain sequences of words and cannot be embedded. Four different named entity types may be identified: persons (PER), organizations (ORG), locations (LOC) and miscellaneous entities (MISC). (9) de machtige burgemeester van <NE type="loc">moskou</ne>, <NE type="per">joeri Loezjkov</NE>, veroordeelt dat The next two examples show temporal expressions which have been both identified and normalized to ISO 8601 format. In Example 10, normalization is straightforward: the ISO 8601 value of the year 1947 is simply 1947.
6 (10) Luc Jouret werd in <TIMEX val="1947">1947</timex> geboren Luc Jouret was born in 1947 Normalization is more complicated in Example 11; in order to determine that donderdagmorgen refers to , the system uses the document timestamp (in this case, ) and some simple heuristics to compute the reference. (11) Ekeus verliet Irak <TIMEX val=" ">donderdagmorgen</timex> Ekeus left Iraq Thursday morning The four annotation layers of the collection were converted to an XML format and stored in separate XML files, to simplify maintenance. Whenever the XQuesta stream requested a document from the collection, all annotation were automatically merged to a single XML document providing full access to the extracted information. In order to produce well-formed XML mark-up after merging, we used a mix of inline and stand-off (with character offsets to refer to original collection text) annotation schemes. 3.2 Question analysis The current question analysis module consists of two parts. The first part determines possible question classes, such as location for the question shown in Example (12). (12) Q0065: Uit welk land komt Diego Maradona? What country does Diego Maradona come from? We use 31 different question types, some of which belong to a more general class: for example, date birth and date death describe dates of birth and death and are subtypes of the class date. The assignment of the classes is based on manually compiled patterns. The second part of our question analysis module is new. Depending on the predicted question class, an expected answer type is assigned. Our new system design allows us to search for structural and semantic information. The expected answer types describe syntactic, lexical or surface requirements which have to be met by the possible answers. The restrictions are formulated in XPath queries which are used to extract specific information from our preprocessed documents. For instance, possible answers to questions such as Example (13) and (14) require a named entity as an answer (person and timex, respectively). Besides the named entity information, the answer to the question in Example (14) has to start with a number. The corresponding XPath queries are NE[@type="PER"] and TIMEX[@val=~/^\d/] respectively. (13) Q0149: Wie is de burgemeester van Moskou in 1994? Who was the mayor of Moskow in 1994? (14) Q0014: Wanneer is Luc Jouret geboren? When was Luc Jouret born? Table 1 displays the rules for mapping the question classes to the expected answer types which were manually developed. 3.3 Extracting and ranking answers As described in Section 3.2, incoming questions are automatically mapped to retrieval queries (simply question texts) and XPath expressions corresponding to types of expected answers. Retrieval queries are used to locate relevant passages in the collection. For retrieval, we use nonoverlapping passages of at least 400 characters starting and ending at paragraph boundaries. Then, the question s XPath queries are evaluated on the top 20 retrieved passages, giving lists of XML elements corresponding to candidate answers. For example, for the question in Example 14 above, with the generated XPath query TIMEX[@val=~/^\d/], the value 1947 is extracted from the annotated text in Example 10.
7 Question class abbreviation age cause reason city capital color date death, date birth, date definition person definition distance distinction expansion height language length location manner monetary unit name number people number organization person score, size, speed, sum of money synonym name temperature, time period Restrictions on the type of answer word in capital letters, possible word: jarige sentence LOC adjective TIMEX, digital number sentence noun phrase or sentence noun phrase or a sentence MISC or ORG, noun phrase MISC LOC sentence MISC named entity, noun phrase ORG PER PER Table 1: Overview of the mapping rules from question classes to answer types The score of each candidate is calculated as the sum of retrieval scores of all passages containing the candidate. Furthermore, the scores are normalized using web hit counts, producing the final ranked list of XQuesta s answer candidates. 4 Runs We submitted two Dutch monolingual runs. The run uams051nlnl used the full system with all streams and final answer selection. The run uams052nlnl, on top of this, used an additional stream: the XQuesta stream with paraphrased questions. As a simple way of paraphrasing questions, we double-translated questions (from Dutch to English and then back to Dutch) using Systran, an automatic MT system. Question paraphrases were only used for query formulation at the retrieval step; question analysis (identification of question types, expected answer types and corresponding XPath queries) was performed on the original questions. Our idea was to see whether simple paraphrasing of retrieval queries would help to find different relevant passages and lead to more correctly answered questions. 5 Results and analysis Our question classifier assigned a question type to 187 questions. All classified questions also received an expected answer type. Eleven of the unclassified questions are What/Which NP questions, such as Example 15. The remaining two unclassified questions are of the form Name an NP. (15) Q0024: Welke ziekte kregen veel Amerikaanse soldaten na de Golfoorlog? Which disease did many American soldiers contract after the Gulf War? Table 2 lists the assessment counts for the two University of Amsterdam question answering runs for CLEF-2005 monolingual Dutch (NLNL) task. The two runs had 13 different answers, but the assessors evaluated only two pairs differently. We were surprised about the large number of inexact answers. When we examined the inexact answers of the first run, we found that a disproportional number of these were generated for definition questions: 85% (only 30% of the questions ask
8 Run Right Unsupp. Inexact Wrong uams051nlnl uams052nlnl Table 2: Assessment counts for the 200 answers in the two Amsterdam runs submitted for Dutch monolingual Question Answering (NLNL) in CLEF Thirteen answers of the two runs were different but only two of these were assessed differently. for definitions). Almost half of the errors (13 out of 28) were caused by the same information extraction problem: determining where a noun phrase starts. Here is an example: (16) Q0094: Wat is Eyal? A: leider van de extreem-rechtse groep Here the answer extreme right group would be correct as an answer to the question What is Eyal?. Unfortunately the system generated the answer leader of the extreme right group. This extraction problem also affected the answers for questions that provided a definition and asked for a name. We expect that this problem can be solved by a check of the semantic class of the question focus word and head noun of the answer, both in the answer extraction process and in answer postprocessing. Such a check would also have prevented seven of the 15 other incorrect answers of this group. When we examined the assessments of the answers to the three different question types, we noticed that the proportion of correct answers was the same for definition questions (45%) and factoid questions (47%) but that temporally restricted questions seemed to cause problems (27% correct). Of the 18 incorrect answers in the latter group, four involved an answer which would have been correct in another time period (questions Q0078, Q0084, Q0092 and Q0195). If these questions had been answered correctly, the score for this category would have raised to an acceptable 46% (including the incorrectly assessed answer to question Q0149). The temporal re-ranking module described in Section 2 did make a positive contribution, however. For the following two temporally restricted questions: (17) Q0007: Welke voetballer ontving De Gouden Bal in 1995? Which footballer won the European Footballer of the Year award in 1995? (18) Q0159: Wie won de Nobelprijs voor medicijnen in 1994? Who won the Nobel Prize for medicine in 1994? the highest ranking candidate answer before the temporal re-ranking module was applied was incorrect (Ruud Gullit and Martin Luther King), but the application of the temporal re-ranking module boosted the correct answer (Weah and Martin Rodbell) to the top position. Additionally, the temporal re-ranking module never demoted a correct answer from the top position. The answers to the temporally restricted questions also indicate the overall problems of the system. Of the remaining 14 incorrect answers, only five were of the expected answer category while nine were of a different category. In the 62 answers of factoid and definition questions which were judged to be wrong, the majority (58%) had an incorrect answer class. An extra answer postprocessing filter which compares the semantic category of the answer and the one expected by the question would prevent such mismatches. Our system produced five answers which were judged to be unsupported. One of these was wrong and another was right. A third answer was probably combined from different answers and a link to a document containing a part rather than the whole answer was kept. The remaining two errors were probably also caused by a document link which should not have been kept but the reason for this is unknown. This error analysis suggests three ways to improve our QA system. First, we should work on definition extraction so that both questions asking for definitions and questions requiring the resolution of definitions can be answered in a better way. Second, we should examine inheritance
9 of document links in the answer tiling process to make sure that the associated module does not cause unnecessary unsupported answers. And third, and most importantly, we should improve answer filtering to make sure that the semantic class of the generated answer corresponds with the class required by the question. 6 Conclusions The bulk of our efforts for the 2005 edition of the QA@CLEF track was aimed at implementing a pure QA-as-XML-retrieval stream, as part of our multi-stream question answering architecture; here, the target collection is automatically annotated with linguistic information at indexing time, incoming questions are converted to semistructured queries, and evaluation of these queries gives a ranked list of candidate answers. The overall system provides wrong answers for less than 40% of the questions. Our ongoing work is aimed at addressing the main sources of error discovered: definition extraction, inheritance of document links in answer tiling, and semantically informed answer filtering. Acknowledgments This research was supported by various grants from the Netherlands Organization for Scientific Research (NWO). Valentin Jijkoun, Karin Müller, and Maarten de Rijke were supported under project number Erik Tjong Kim Sang and Maarten de Rijke were supported under project number David Ahn and Maarten de Rijke were supported under project number In addition, Maarten de Rijke was also supported by grants from NWO, under project numbers , , , , , and References [1] D. Ahn, V. Jijkoun, G. Mishne, K. Müller, M. de Rijke, and S. Schlobach. Using wikipedia at the trec qa track. In E. Voorhees and L. Buckland, editors, The Thirteenth Text Retrieval Conference (TREC 2004), Gaithersburg, Maryland, [2] D. Ahn, V. Jijkoun, K. Müller, M. de Rijke, S. Schlobach, and G. Mishne. Making stone soup: Evaluating a recall-oriented multi-stream question answering stream for dutch. In C. Peters, P. Clough, G. Jones, J. Gonzalo, M. Kluck, and B. Magnini, editors, Multilingual Information Access for Text, Speech and Images: Results of the Fifth CLEF Evaluation Campaign, LNCS Springer Verlag, [3] T. Brants. TnT A Statistical Part-Of-Speech tagger. Saarland University, [4] E. Brill, S. Dumais, and M. Banko. An analysis of the AskMSR question-answering system. In Proc. Empirical Methods in Natural Language Processing (EMNLP 2002), pages , [5] S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In P. Bennett, S. Dumais, and E. Horvitz, editors, Proceedings of SIGIR 02, pages , [6] V. Jijkoun and M. de Rijke. Answer Selection in a Multi-Stream Open Domain Question Answering System. In S. McDonald and J. Tait, editors, Proceedings 26th European Conference on Information Retrieval (ECIR 04),, volume 2997 of LNCS, pages Springer, [7] V. Jijkoun, G. Mishne, and M. de Rijke. Preprocessing Documents to Answer Dutch Questions. In Proceedings of the 15th Belgian-Dutch Conference on Artificial In telligence (BNAIC 03), 2003.
10 [8] V. Jijkoun, G. Mishne, and M. de Rijke. How frogs built the Berlin Wall. In Proceedings CLEF 2003, LNCS. Springer, [9] V. Jijkoun, G. Mishne, C. Monz, M. de Rijke, S. Schlobach, and O. Tsur. The University of Amsterdam at the TREC 2003 Question Answering Track. In Proceedings TREC 2003, pages , [10] K. Litkowksi. Use of metadata for question answering and novelty tasks. In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003), [11] Lucene. The Lucene search engine. URL: [12] D. Moldovan, M. Pasca, S. Harabagiu, and M. Surdeanu. Performance issues and error analysis in an open-domain question answering system. ACM Transactions on Information Systems, 21(2): , [13] P. Ogilvie. Retrieval using structure for question answering. In V. Mihajlovic and D. Hiemstra, editors, Proceedings of the First Twente Data Management Workshop (TDM 04), pages 15 23, [14] S. Schlobach, M. Olsthoorn, and M. de Rijke. Type Checking in Open-Domain Question Answering. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), [15] I. Schuurman, M. Schouppe, H. Hoekstra, and T. van der Wouden. CGN, an Annotated Corpus of Spoken Dutch. In Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). Budapest, Hungary, [16] E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL-2002, pages Taipei, Taiwan, [17] E. F. Tjong Kim Sang, W. Daelemans, and A. Höthker. Reduction of Dutch Sentences for Automatic Subtitling. In Proceedings of CLIN-2003, pages University of Antwerp, Antwerp Papers in Linguistics, 111, 2004.
The University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
MIRACLE Question Answering System for Spanish at CLEF 2007
MIRACLE Question Answering System for Spanish at CLEF 2007 César de Pablo-Sánchez, José Luis Martínez, Ana García-Ledesma, Doaa Samy, Paloma Martínez, Antonio Moreno-Sandoval, Harith Al-Jumaily Universidad
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
TREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China [email protected] http://www.ict.ac.cn/
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
How To Write A Question Answering System For A Quiz At The University Of Amsterdam
The University of Amsterdam at CLEF@QA 2006 Valentin Jijkoun Joris van Rantwijk David Ahn Erik Tjong Kim Sang Maarten de Rijke ISLA, University of Amsterdam jijkoun,rantwijk,ahn,erikt,[email protected]
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow
A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
FoLiA: Format for Linguistic Annotation
Maarten van Gompel Radboud University Nijmegen 20-01-2012 Introduction Introduction What is FoLiA? Generalised XML-based format for a wide variety of linguistic annotation Characteristics Generalised paradigm
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
The University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Query term suggestion in academic search
Query term suggestion in academic search Suzan Verberne 1, Maya Sappelli 1,2, and Wessel Kraaij 2,1 1. Institute for Computing and Information Sciences, Radboud University Nijmegen 2. TNO, Delft Abstract.
Optimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
Named Entity Recognition in Broadcast News Using Similar Written Texts
Named Entity Recognition in Broadcast News Using Similar Written Texts Niraj Shrestha Ivan Vulić KU Leuven, Belgium KU Leuven, Belgium [email protected] ivan.vulic@@cs.kuleuven.be Abstract
Data-Intensive Question Answering
Data-Intensive Question Answering Eric Brill, Jimmy Lin, Michele Banko, Susan Dumais and Andrew Ng Microsoft Research One Microsoft Way Redmond, WA 98052 {brill, mbanko, sdumais}@microsoft.com [email protected];
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Web-Based Question Answering: A Decision-Making Perspective
UAI2003 AZARI ET AL. 11 Web-Based Question Answering: A Decision-Making Perspective David Azari Eric Horvitz Susan Dumais Eric Brill University of Washington Microsoft Research Microsoft Research Microsoft
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Activity Mining for Discovering Software Process Models
Activity Mining for Discovering Software Process Models Ekkart Kindler, Vladimir Rubin, Wilhelm Schäfer Software Engineering Group, University of Paderborn, Germany [kindler, vroubine, wilhelm]@uni-paderborn.de
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Towards a Visually Enhanced Medical Search Engine
Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
How to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Example-based Translation without Parallel Corpora: First experiments on a prototype
-based Translation without Parallel Corpora: First experiments on a prototype Vincent Vandeghinste, Peter Dirix and Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven Maria
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Stanford s Distantly-Supervised Slot-Filling System
Stanford s Distantly-Supervised Slot-Filling System Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning Computer Science Department,
Annotation and Evaluation of Swedish Multiword Named Entities
Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden [email protected] Introduction
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Text Generation for Abstractive Summarization
Text Generation for Abstractive Summarization Pierre-Etienne Genest, Guy Lapalme RALI-DIRO Université de Montréal P.O. Box 6128, Succ. Centre-Ville Montréal, Québec Canada, H3C 3J7 {genestpe,lapalme}@iro.umontreal.ca
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Intelligent Log Analyzer. André Restivo <[email protected]>
Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.
Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
How To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
Annotation Guidelines for Dutch-English Word Alignment
Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College
ETL Ensembles for Chunking, NER and SRL
ETL Ensembles for Chunking, NER and SRL Cícero N. dos Santos 1, Ruy L. Milidiú 2, Carlos E. M. Crestana 2, and Eraldo R. Fernandes 2,3 1 Mestrado em Informática Aplicada MIA Universidade de Fortaleza UNIFOR
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
» A Hardware & Software Overview. Eli M. Dow <[email protected]:>
» A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours
Automated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
ASKWIKI : SHALLOW SEMANTIC PROCESSING TO QUERY WIKIPEDIA. Felix Burkhardt and Jianshen Zhou. Deutsche Telekom Laboratories, Berlin, Germany
20th European Signal Processing Conference (EUSIPCO 2012) Bucharest, Romania, August 27-31, 2012 ASKWIKI : SHALLOW SEMANTIC PROCESSING TO QUERY WIKIPEDIA Felix Burkhardt and Jianshen Zhou Deutsche Telekom
Task Description for English Slot Filling at TAC- KBP 2014
Task Description for English Slot Filling at TAC- KBP 2014 Version 1.1 of May 6 th, 2014 1. Changes 1.0 Initial release 1.1 Changed output format: added provenance for filler values (Column 6 in Table
Simple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
