THUTR: A Translation Retrieval System
|
|
|
- Anna Fox
- 10 years ago
- Views:
Transcription
1 THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for Information Science and Technology Tsinghua University, Beijing , China 4 2 q 2 s s 6 ABSTRACT We introduce a translation retrieval system THUTR, which casts translation as a retrieval problem. Translation retrieval aims at retrieving a list of target-language translation candidates that may be helpful to human translators in translating a given source-language input. While conventional translation retrieval methods mainly rely on parallel corpus that is difficult and expensive to collect, we propose to retrieve translation candidates directly from target-language documents. Given a source-language query, we first translate it into target-language queries and then retrieve translation candidates from target language documents. Experiments on Chinese-English data show that the proposed translation retrieval system achieves 95.32% and 92.00% in terms of P@10 at sentence level and phrase level tasks, respectively. Our system also outperforms a retrieval system that uses parallel corpus significantly. TITLE AND ABSTRACT IN CHINESE THUTRµ È u XÚ 0 È u XÚTHUTR"TXÚò ÈÀŠ u K"È u 3 Ñ\Šó ÿà È È<Jø Ï"DÚÈ u { Ì ÄuVŠŠ Ù Ì K ïvšš dp[" d JÑ ^üšš yè u " ½ŠóÎ ÄkòÙ È 8IŠóÎ, 2lüŠŠ u ÿàè "3Ç=êâþ L² JÑÈ u XÚ O3éf?ÚáŠ?95.32%Ú92.00%P@10Š" XÚ wí L ^²1 Š È u XÚ" KEYWORDS: Translation retrieval, monolingual corpus, statistical machine translation. KEYWORDS IN CHINESE: È u üšš ÚOÅì È. Chunyang Liu and Qi Liu have equal contribution to this work Proceedings of COLING 2012: Demonstration Papers, pages , COLING 2012, Mumbai, December
2 1 Introduction This demonstration introduces a translation retrieval system THUTR, which combines machine translation and information retrieval to provide useful information to translation users. Unlike machine translation, our system casts translation as a retrieval problem: given a source-language string, returns a list of ranked target-language strings that contain its (partial) translations from a large set of target-language documents. Figure 1: A screenshot of the translation retrieval system. For example, as shown in Figure 1, given a Chinese query, our system searches for its translations in a large set of English documents and returns a list of ranked sentences. 1 Although the retrieved documents might not be exact translations of the query, they often contain useful partial translations to help human translators produce high-quality translations. As the fluency of target-language documents are usually guaranteed, the primary goal of translation retrieval is to find documents that are relevant to the source-language query. By relevant, we mean that retrieved documents contain (partial) translations of queries. Our system is divided into two modules: 1. machine translation module: given a source-language query, translates it into a list of translation candidates; 2. information retrieval module: takes the translation candidates as target-language queries and retrieves relevant documents. Generally, the MT module ensures the fidelity of retrieved documents and the IR module ensures the fluency. Therefore, the relevance can be measured based on the coupling of translation and retrieval models. We evaluate our system on Chinese-English data. Experiments show that our translation retrieval system achieves 95.32% and 92.00% in terms of P@10 at sentence level and phrase level tasks, respectively. Our system also significantly outperforms a retrieval system that uses parallel corpus. 1 In our system, each document is a single sentence. 322
3 2 Related Work Translation retrieval is firstly introduced in translation memory (TM) systems (Baldwin and Tanaka, 2000; Baldwin, 2001). Translation equivalents of maximally similar source language strings are chosen as the translation candidates. This is similar with example-based machine translation (Nagao, 1984) that translates by analogy based on parallel corpus. Unfortunately, these systems suffer from a major drawback: the amount and domain of parallel corpus are relatively limited. Hong et al. (2010) propose a method for mining parallel data from the Web. They focus on parallel data mining and report significant improvements on MT experiments. Many researchers have explored the application of MT techniques to information retrieval tasks. Berger and Lafferty (1999) introduce a probabilistic approach to IR based on statistical machine translation models. Federico and Bertoldi (2002) divide CLIR into two translation models: a query-translation model and a query-document model. They show that offering more query translations improves retrieval performance. Murdock and Croft (2005) propose a method for sentence retrieval but in a monolingual scenario. They incorporate a machine translation in two steps: estimation and ranking. Sanchez-Martinez and Carrasco (2011) investigate how to retrieve documents that are plausible translations of a given source language document using statistical machine translation techniques. 3 System Description Figure 2: System architecture. Given a source language query f, conventional translation retrieval systems (Baldwin and Tanaka, 2000; Baldwin, 2001) search for a source string ˆf that has the maximal similarity with f in a parallel corpus D f,e = {( f 1, e 1 ),..., ( f N, e N )}: ˆf = arg max f D f,e sim( f, f ) where sim( f, f ) calculates the similarity between two source language strings f and f. Then, retrieval systems return the target language string ê corresponding to ˆf in D f,e. Therefore, conventional translation retrieval systems only rely on source language string matching and do not actually consider the translation probability between f and ê. Alternatively, given a source language query f, our translation retrieval system searches for a target string ê that has the maximal translation probability with f in a monolingual corpus D e = {e 1,..., e M } : ê = arg max e D e P(e f ) (2) (1) 323
4 where P(e f ) is the probability that e contains a translation of f. This problem definition is similar with cross-lingual information retrieval (CLIR) (Ballesteros and Croft, 1997; Nie et al., 1999) except for the relevance judgement criterion. While CLIR requires the retrieved documents to be relevant to users information need, translation retrieval expects to return documents containing the translations of queries. For example, given a Chinese query c$ (i.e., Olympic Games ), both Olympic Games and London are relevant in CLIR. However, in translation retrieval, only Olympic Games is relevant. Therefore, translation retrieval can be seen as a special case of CLIR. The translation probability P(e f ) can be further decomposed by introducing a target language query e as a hidden variable: P(e f ) = P(e, e f ) = P(e f ) P(e e ) (3) e e where P(e f ) is a translation model and P(e e ) is a retrieval model. 2 In this work, we use a phrase-based model as the translation model. Phrase-based models (Och and Ney, 2002; Marcu and Wong, 2002; Koehn et al., 2003) treat phrase as the basic translation unit. Based on a log-linear framework (Och and Ney, 2002), the phrase-based model used in our system is defined as P(e f ) = score(e, f ) e score(e, f ) score(e, f ) = P φ (e f ) λ 1 P φ ( f e ) λ 2 P lex (e f ) λ 3 P lex ( f e ) λ 4 ex p(1) λ 5 ex p( e ) λ 6 P lm (e ) λ 7 P d ( f, e ) λ 8 (5) The λ s are feature weights that can be optimized using the minimum error rate training algorithm (Och, 2003). We use the vector space model for calculating the cosine similarity of a target language query e and a target language document e. Therefore, the decision rule for translation retrieval is 3 ê = arg max e arg max e,e P(e f ) P(e e ) e score(e, f ) sim(e, e) The pipeline of our translation retrieval system is shown in Figure 2. Given a source language query f, the system first obtains a K-best list of target language query candidates: e 1, e 2,..., e K. For each target language query e k, the system returns a list of translation candidates E k. These translation candidates are merged and sorted according to Eq. (7) to produce the final ranked list E. 2 Such query translation framework has been widely used in CLIR (Nie et al., 1999; Federico and Bertoldi, 2002) In this work, e is a translation of f produced by MT systems and e is a target language document that probably contains a translation of f. While e is usually ungrammatical and erroneous, e is often written by native speakers. 3 In practice, we add a parameterλ 9 to Eq (12) to achieve a balance between translation model and retrieval model: score(e, f ) sim(e, e) λ 9. We setλ 9 = 2 in our experiments. (4) (6) (7) 324
5 4 Experiments We evaluated our translation retrieval system on Chinese-English data that contains 2.2M sentence pairs. They are divided into three parts: 1. Training set (220K): training phrase-based translation model and feature weights. 2. Query set (5K): source language queries paired with their translations. 3. Document set (1.99M): target language sentences paired with their corresponding source language sentences. The statistical machine translation system we used is Moses (Koehn et al., 2007) with its default setting except that we set the maximal phrase length to 4. For language model, we used SRI Language Modeling Toolkit (Stolcke, 2002) to train a 4-gram model with modified Kneser-Ney smoothing (Chen and Goodman, 1998). Our retrieval module is based on Apache Lucene, the state-of-the-art open-source search software Sentence Level We first evaluated our system at sentence level: treating a source language sentence as query. Given the bilingual query set {( f q 1, eq 1 ),..., ( f q Q, eq Q )} and the bilingual document set {( f d 1, ed 1 ),..., ( f d D, ed q D )}, we treat { f1,..., f q Q } as the query set and {eq 1,..., eq Q, ed 1,..., ed D } as the document set. Therefore, relevance judgement can be done automatically because the gold standard documents (i.e., {e q 1,..., eq Q }) are included in the document set. The evaluation metric is P@n, n = 1, 10, 20, 50. phrase length BLEU P@1 P@10 P@20 P@ Table 1: Effect of maximal phrase length. Table 1 shows the effect of maximal phrase length on retrieval performance. Beside retrieval precisions, Table 2 also lists the BLEU scores of query translation. Retrieval performance generally rises with the increase of translation accuracy. Surprisingly, our system achieves over 90% in terms of P@1 when the maximal phrase length is greater than 2. This happens because it is much easier for translation retrieval to judge relevance of retrieved documents than conventional IR systems. We find that the performance hardly increase when the maximal phrase length is greater than 3. This is because long phrases are less likely to be used to translate unseen text. As a result, extracting phrase pairs within 4 words is good enough to achieve reasonable retrieval performance. Table 2 shows the effect of the K-best list translations output by the MT system. It can be treated as query expansion for translation retrieval, which proves to be effective in other IR
6 1-best best Table 2: Effect of K-best list translations. systems. We find that providing more query translations to the retrieval module does improve translation quality significantly. system parallel monolingual Table 3: Comparison to translation retrieval with parallel corpus. We also compared our system with a retrieval system that relies on parallel corpus. Table 3 shows the results for retrieval systems using parallel and monolingual corpora, respectively. Surprisingly, our system significantly outperforms retrieval system using parallel corpus. We notice that Baldwin (2001) carefully chose data sets in which the language is controlled or highly constrained. In other words, a given word often has only one translation across all usages and syntactic constructions are limited. This is not the case for our datasets. Therefore, it might be problematic for conventional retrieval systems to calculate source language string similarities. This problem is probably alleviated in our system by providing multiple query translations. 4.2 Phrase Level P@1 P@10 1-best best Table 4: Results of phrase level retrieval. Finally, we evaluated our system at phrase level: a source language phrase as query. We selected 50 phrases from the source language query set. The average length of a phrasal query is 2.76 words. On average, a query occurs in the source language query set for 3 times. Given a source query, our system returns a list of ranked target language documents. Relevance judgement was performed manually. Table 5 shows the results of phrase level evaluation. We compared between using 1-best translations and 10-best translations produced by MT systems. The P@10 for using 10-best translations reaches 92%. Conclusion We have presented a system THUTR that retrieves translations directly from monolingual corpus. Experiments on Chinese-English data show that our retrieval system achieves 95.32% and 92.00% in terms of P@10 for sentence level and phrase level queries, respectively. In the future, we would like to extend our approach to the web that provides enormous monolingual data. In addition, we plan to investigate more accurate retrieval models for translation retrieval. Acknowledgments This work is supported by NSFC project No , 863 project No. 2012AA011102, and a Research Fund No from Microsoft Research Asia. 326
7 References Baldwin, T. (2001). Low-cost, high-performance translation retrieval: Dumber is better. In ACL Baldwin, T. and Tanaka, H. (2000). The effects of word order and segmentation on translation retrieval performance. In COLING Ballesteros, L. and Croft, B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. In SIGIR Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In SIGIR Chen, S. F. and Goodman, J. (1998). An empirical study of smoothing techniques for language modeling. Technical report, Harvard University Center for Research in Computing Technology. Federico, M. and Bertoldi, N. (2002). Statistical cross-language information retrieval using n-best query translations. In SIGIR Hong, G., Li, C.-H., Zhou, M., and Rim, H.-C. (2010). An empirical study on web mining of parallel data. In COLING Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In ACL 2007 (the Demo and Poster Sessions). Koehn, P., Och, F., and Marcu, D. (2003). Statistical phrase-based translation. In NAACL Marcu, D. and Wong, D. (2002). A phrase-based, joint probability model for statistical machine translation. In EMNLP Murdock, V. and Croft, B. (2005). A translation model for sentence retrieval. In EMNLP Nagao, M. (1984). A framework of a mechanical translation between japanese and english by analogy principle. In Elithorn, A. and Banerji, R., editors, Artificial and Human Intelligence. North-Holland. Nie, J.-Y., Simard, M., Isabelle, P., and Durand, R. (1999). Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In SIGIR Och, F. (2003). Minimum error rate training in statistical machine translation. In ACL Och, F. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In ACL Sanchez-Martinez, F. and Carrasco, R. (2011). Document translation retrieval based on statistical machine translation techniques. Applied Artificial Intelligence, 25(5). Stolcke, A. (2002). Srilm - an extensible language modeling toolkit. In ICSLP
8
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation Mu Li Microsoft Research Asia [email protected] Dongdong Zhang Microsoft Research Asia [email protected]
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh [email protected] Abstract We describe our systems for the SemEval
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Machine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
Building a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
Factored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang [email protected], [email protected] School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden [email protected]
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering Guangyou Zhou, Kang Liu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese
Factored Markov Translation with Robust Modeling
Factored Markov Translation with Robust Modeling Yang Feng Trevor Cohn Xinkai Du Information Sciences Institue Computing and Information Systems Computer Science Department The University of Melbourne
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
Translating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Collaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert [email protected] Jean Senellart Systran SA [email protected] Laurent Romary Humboldt Universität Berlin
TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
mproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora Rajdeep Gupta, Santanu Pal, Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University Kolata
Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System Ki-Young Lee and Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
An Iterative Link-based Method for Parallel Web Page Mining
An Iterative Link-based Method for Parallel Web Page Mining Le Liu 1, Yu Hong 1, Jun Lu 2, Jun Lang 2, Heng Ji 3, Jianmin Yao 1 1 School of Computer Science & Technology, Soochow University, Suzhou, 215006,
Choosing the best machine translation system to translate a sentence by using only source-language information*
Choosing the best machine translation system to translate a sentence by using only source-language information* Felipe Sánchez-Martínez Dep. de Llenguatges i Sistemes Informàtics Universitat d Alacant
Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation
Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,
The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18. A Guide to Jane, an Open Source Hierarchical Translation Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18 A Guide to Jane, an Open Source Hierarchical Translation Toolkit Daniel Stein, David Vilar, Stephan Peitz, Markus Freitag, Matthias
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
Convergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
Machine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR
A Study of Using an Out-Of-Box Commercial MT System for Query Translation in CLIR Dan Wu School of Information Management Wuhan University, Hubei, China [email protected] Daqing He School of Information
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
An End-to-End Discriminative Approach to Machine Translation
An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
Domain Adaptation for Medical Text Translation Using Web Resources
Domain Adaptation for Medical Text Translation Using Web Resources Yi Lu, Longyue Wang, Derek F. Wong, Lidia S. Chao, Yiming Wang, Francisco Oliveira Natural Language Processing & Portuguese-Chinese Machine
Factored bilingual n-gram language models for statistical machine translation
Mach Translat DOI 10.1007/s10590-010-9082-5 Factored bilingual n-gram language models for statistical machine translation Josep M. Crego François Yvon Received: 2 November 2009 / Accepted: 12 June 2010
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
Human in the Loop Machine Translation of Medical Terminology
Human in the Loop Machine Translation of Medical Terminology by John J. Morgan ARL-MR-0743 April 2010 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings in this report
Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information
Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information Zhongqing Wang, Sophia Yat Mei Lee, Shoushan Li, and Guodong Zhou Natural Language Processing Lab, Soochow University,
The United Nations Parallel Corpus v1.0
The United Nations Parallel Corpus v1.0 Michał Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen United Nations, DGACM, New York, United States of America Adam Mickiewicz University, Poznań, Poland World
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure
Cross-Language Information Retrieval by Domain Restriction using Web Directory Structure Fuminori Kimura Faculty of Culture and Information Science, Doshisha University 1 3 Miyakodani Tatara, Kyoutanabe-shi,
Overview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies
TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies François Yvon and Yong Xu and Marianna Apidianaki LIMSI, CNRS, Université Paris-Saclay 91 403 Orsay {yvon,yong,marianna}@limsi.fr
Statistical Machine Translation prototype using UN parallel documents
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Statistical Machine Translation prototype using UN parallel documents Bruno Pouliquen, Christophe Mazenc World Intellectual Property
Oersetter: Frisian-Dutch Statistical Machine Translation
Oersetter: Frisian-Dutch Statistical Machine Translation Maarten van Gompel and Antal van den Bosch, Centre for Language Studies, Radboud University Nijmegen and Anne Dykstra, Fryske Akademy > Abstract
Turker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
Improving Question Retrieval in Community Question Answering Using World Knowledge
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Improving Question Retrieval in Community Question Answering Using World Knowledge Guangyou Zhou, Yang Liu, Fang
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 7 16
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 21 7 16 A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context Mirko Plitt, François Masselot
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection
Web based English-Chinese OOV term translation using Adaptive rules and Recursive feature selection Jian Qu, Nguyen Le Minh, Akira Shimazu School of Information Science, JAIST Ishikawa, Japan 923-1292
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Exploiting Comparable Corpora and Bilingual Dictionaries. the Cross Language Text Categorization
Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization Alfio Gliozzo and Carlo Strapparava ITC-Irst via Sommarive, I-38050, Trento, ITALY {gliozzo,strappa}@itc.it
Chinese-Japanese Machine Translation Exploiting Chinese Characters
Chinese-Japanese Machine Translation Exploiting Chinese Characters CHENHUI CHU, TOSHIAKI NAKAZAWA, DAISUKE KAWAHARA, and SADAO KUROHASHI, Kyoto University The Chinese and Japanese languages share Chinese
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
A Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart Introduction Generation
Subordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Chapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.
Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch
Computer Aided Translation
Computer Aided Translation Philipp Koehn 30 April 2015 Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority
