N-best. N-best Reranking Using Optimal Phrase Alignment for Statistical Machine Translation
|
|
- Ernest Hopkins
- 7 years ago
- Views:
Transcription
1 Vol. 51 No (Aug. 2010) N-best N-best Reranking Using Optimal Phrase Alignment for Statistical Machine Translation Mitsuru Koshikawa, 1 Masao Utiyama, 2 Shunji Umetani, 3 Tomomi Matsui 4 and Mikio Yamamoto 1 Phrase-based statistical machine translation system outputs the candidate having the highest probability based on the probabilistic phrase translation rules. However, there exist a huge number of translation candidates and ambiguities on phrase segmentations/alignments for source and target sentences. Therefore, the current statistical translation systems use various heuristics for reducing the number of translation candidates and approximating phrasealignment probabilities, in order to narrow the search space. This paper proposes the formulation to strictly maximize the phrase-alignment probability computed from all features which most phrase-based statistical machine translation systems use within. We also propose a reranking method based on the proposed phrase alignment optimization. In evaluation experiments, our system improved significantly the translation quality. The experimental results also suggested that a variety of translation candidates are more important for increasing accuracy than exact phrase alignments ) 1 2),3) 2) 5) 2 6) 1 Graduate School of Systems and Information Engineering, University of Tsukuba 2 MASTAR MASTAR Project, National Institute of Informaiton and Communications Technology 3 Graduate School of Information Science and Technology, Osaka University 4 Department of Information and System Engineering, Faculty of Science and Engineering, Chuo University 1443 c 2010 Information Processing Society of Japan
2 1444 N-best phrase alignment 7) 4) phrase aligner f e ê ê 1) ê =argmax P (e) P (f, c e) e e arg e arg e max e c P (e)max c P (f, c e) max P (e)p (f, c e) (1) e,c (1) P (e) e c f e (1) 2 maxc c arg e max e,c f e max max 2) (1) (2) 3) ê =arg e max e,c Fig. 1 1 An example of phrase-based translation. λ k h k (f, c, e) (2) k h k (f, c, e) k λ k (1) (1) ) 1 2 2) P (f, c e) =P (c e) P (f c, e) P ( c I 1 e ) I P ( ) fci ē i i=1 f ē c I 1 = c 1,c 2,...,c I c i i ē i ē i c i f ci P ( f ci ē i) P (c I 1 e) 2.3 Lexicalized Block Orientation 4) 2) (3)
3 1445 N-best i (i +1) Lexicalized Block Orientation Lexicalized Block Orientation LBO i (i +1) ē i, ē i+1 f ci f ci+1 3 4) monotone (c i+1 = c i +1 ), class (c i,c i+1) = swap (c i+1 = c i 1 ), (4) discontinuous ( ). monotone swap discontinuous 4) 1 today rainy swap LBO 3 (3) 1 4) P ( c I 1 e ) I P (class (c i,c i+1) ē i, ē i+1) (5) i=1 P (class (c i,c i+1) ē i, ē i+1) P ( class (c i,c i+1) f ci, f ) ci+1, ē i, ē i f ci f ci+1 2) monotone 0 f ci end i f ci+1 start i+1 d(end i,start i+1) 2) Fig. 2 2 Examples of correct phrase alignment (left) and incorrect phrase alignment (right). d (end i,start i+1) = end i start i+1 +1 (6) (3) ) 2 f 1 f 4 e 1 e f, e fci, ē i,p( f ci ē i) 7) ˆ f I 1, ˆē I 1, ĉ I 1 =arg max P ( ) I f 1 ē I 1,c I (ēi I,c I 1, f 1 I =f,ēi 1 =e 1 P 1,c I 1 e ) (7) (7) 1 2 (3) 2 1 (1) 2 max c
4 1446 N-best ( min ( f, e ) ξ ) I=1 I I ξ ξ 500 min ( f, e ) solver 7) fck, ē k 2 x k {0, 1} F 7) F F E 7) f = f 1,f 2,f 3,f 4, e = e 1,e 2,e F E (8) 1 f 1 f 2 F F = ,E = (8) (7) 2 7) 7) maximize Fig. 3 3 Candidates of phrase pairs for input paralell sentences. x k t k k K subject to F x = 1, (9) Ex = 1, x k {0, 1} ( k K). t k λ tm log P ( f ck ē k ) λ tm K 1 =(1,...,1) x =(x 1,...,x K) F x = 1 F x 1 1 Ex = N N-best (1) arg e max e,c c
5 1447 N-best Fig. 4 4 Flowchart of N-best reranking method using phrase aligner. max c max c LBO x 2 x e 1 e 3 3 (9) F x = 1 s g LBO 5 3 Fig. 5 Directed graph built from the target side phrases in 3. LBO (9) a a 1 0 y a maximize x k t k + y ar a k K subject to F x = 1, My = b, Ny = x, a A x k {0, 1} ( k K), y a {0, 1} ( a A). (10)
6 1448 N-best My = b s g b +1 1 b s = 1 b g =+1 b others =0 b s s b g g b others N Ny = x y x (10) Ny = x F x = 1 My = b F x = 1 My = b x y Ny = x (10) x y Ny = x A r a a r a My = b (11) 1 M s g (11) 5 y 4 + y 5 y 6 =0 4 y 4 + y 5 y y 1 y 2 y 3 y 4 y 5 y = y 4 y x 4 1 y 4 + y 5 = x 4 y x Ny = x y 1 y 2 y 3 y 4 y 5 y 6 = x 1 x 2 x 3 x 4 1 (11) (12) 1 y x N NTCIR-7 8) development Minimum Error-Rate Training 9) NTCIR-7 1 BLEU 10)
7 1449 N-best Table 1 1 NTCIR-7 Description of the data set on NTCIR-7 patent translation task. 1,798,571 59,974, ,435 1,798,571 64,184, ,652 dev ,028 3,986 dev ,427 3,653 1,381 45,334 4,116 1,381 48,737 3,882 2 Table 2 Experimental condition. Moses 08/02/20 release 10, 20, 50, 100, 200, 500, 1,000 ttable-limit 20 msd-bidirectional-fe 5gram, Interpolated Modified Kneser-Ney solver ILOG CPLEX 11.0 Fig. 6 6 Effect of the beam width on translation quality (BLEU). Moses 5) 1 N-best N-best SRI Language Modeling Toolkit 11) 2 Moses 1 N-best solver CPLEX ) 4.2 BLEU 6 Moses rerank Moses Moses BLEU 5% 1 Moses N-best Moses 7 Fig. 7 Effect of the beam width on the average score improvement. 200 Moses 500 Moses Moses N-best BLEU
8 1450 N-best Fig. 8 8 BLEU Effect of the search time on translation quality (BLEU) BLEU 8 Moses rerank Moses Moses Moses 2 5. N-best NTCIR-7 Moses BLEU 1) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D. and Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, Vol.19, No.2, pp (1993). 2) Koehn, P., Och, F.J. and Marcu, D.: Statistical Phrase-Based Translation, Proc. Human Language Technology and North American Association for Computational Linguistics Conference (2003). 3) Och, F.J. and Ney, H.: The Alignment Template Approach to Statistical Machine Translation, Computational Linguistics, Vol.30, No.4, pp (2004). 4) Tillmann, C. and Zhang, T.: A Localized Prediction Model for Statistical Machine Translation, Proc. 43rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp (2005). 5) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A. and Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation, Proc. 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Association for Computational Linguistics, pp (2007). 6) Hasan, S., Zens, R. and Ney, H.: Are Very Large N-Best Lists Useful for SMT?, Proc. Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, Association for Computational Linguistics, pp (2007). 7) DeNero, J. and Klein, D.: The Complexity of Phrase Alignment Problems, Proc. 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technology, Short Papers, Association for Computational Linguistics, pp (2008).
9 1451 N-best 8) Fujii, A., Utiyama, M., Yamamoto, M. and Utsuro, T.: Overview of the Patent Translation Task at the NTCIR-7 Workshop, Proc. 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-lingual Information Access, NTCIR, pp (2008). 9) Och, F.J.: Minimum Error Rate Training in Statistical Machine Translation, Proc. 41st Anuual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp (2003). 10) Papineni, K., Roukos, S., Ward, T. and Zhu, W.J.: BLEU: A Method for Automatic Evaluation of Machine Translation, Proc. 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp (2002). 11) Stolcke, A.: SRILM An Extensible Language Modeling Toolkit, Proc. International Conference on Spoken Language Processing, pp (2002). 12) ILOG: ILOG CPLEX 11.0 User s Manual, ILOG (2007). ( ) ( ) OR INFORMS MPS ACL ACL
THUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More information7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationMachine translation techniques for presentation of summaries
Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)
More informationAn Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
More informationBuilding a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
More informationThe KIT Translation system for IWSLT 2010
The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of
More informationThe Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden oscar@sics.se
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
More informationCollaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin
More informationAn End-to-End Discriminative Approach to Machine Translation
An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
More informationUM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming
More informationUEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationSegmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System
Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies
More informationAppraise: an Open-Source Toolkit for Manual Evaluation of MT Output
Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output Christian Federmann Language Technology Lab, German Research Center for Artificial Intelligence, Stuhlsatzenhausweg 3, D-66123 Saarbrücken,
More informationParallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems
Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing
More informationTranslating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
More informationAdapting General Models to Novel Project Ideas
The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe
More informationApplying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System Ki-Young Lee and Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research
More informationEnriching Morphologically Poor Languages for Statistical Machine Translation
Enriching Morphologically Poor Languages for Statistical Machine Translation Eleftherios Avramidis e.avramidis@sms.ed.ac.uk Philipp Koehn pkoehn@inf.ed.ac.uk School of Informatics University of Edinburgh
More informationHuman in the Loop Machine Translation of Medical Terminology
Human in the Loop Machine Translation of Medical Terminology by John J. Morgan ARL-MR-0743 April 2010 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings in this report
More informationPolish - English Statistical Machine Translation of Medical Texts.
Polish - English Statistical Machine Translation of Medical Texts. Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology kwolk@pjwstk.edu.pl Abstract.
More informationOn-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications
On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications 12 12 2 3 Jordi Centelles ', Marta R. Costa-jussa ', Rafael E. Banchs, and Alexander Gelbukh Universitat Politecnica
More informationFactored Markov Translation with Robust Modeling
Factored Markov Translation with Robust Modeling Yang Feng Trevor Cohn Xinkai Du Information Sciences Institue Computing and Information Systems Computer Science Department The University of Melbourne
More informationPortuguese-English Statistical Machine Translation using Tree Transducers
Portuguese-English tatistical Machine Translation using Tree Transducers Daniel Beck 1, Helena Caseli 1 1 Computer cience Department Federal University of ão Carlos (UFCar) {daniel beck,helenacaseli}@dc.ufscar.br
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 7 16
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 21 7 16 A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context Mirko Plitt, François Masselot
More informationTowards a General and Extensible Phrase-Extraction Algorithm
Towards a General and Extensible Phrase-Extraction Algorithm Wang Ling, Tiago Luís, João Graça, Luísa Coheur and Isabel Trancoso L 2 F Spoken Systems Lab INESC-ID Lisboa {wang.ling,tiago.luis,joao.graca,luisa.coheur,imt}@l2f.inesc-id.pt
More informationCache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation
Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,
More informationThe noisier channel : translation from morphologically complex languages
The noisier channel : translation from morphologically complex languages Christopher J. Dyer Department of Linguistics University of Maryland College Park, MD 20742 redpony@umd.edu Abstract This paper
More informationChoosing the best machine translation system to translate a sentence by using only source-language information*
Choosing the best machine translation system to translate a sentence by using only source-language information* Felipe Sánchez-Martínez Dep. de Llenguatges i Sistemes Informàtics Universitat d Alacant
More informationJane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation Joern Wuebker M at thias Huck Stephan Peitz M al te Nuhn M arkus F reitag Jan-Thorsten Peter Saab M ansour Hermann N e
More informationThe University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation
The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics
More informationTRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18. A Guide to Jane, an Open Source Hierarchical Translation Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL 2011 5 18 A Guide to Jane, an Open Source Hierarchical Translation Toolkit Daniel Stein, David Vilar, Stephan Peitz, Markus Freitag, Matthias
More informationChapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
More informationExperiments in morphosyntactic processing for translating to and from German
Experiments in morphosyntactic processing for translating to and from German Alexander Fraser Institute for Natural Language Processing University of Stuttgart fraser@ims.uni-stuttgart.de Abstract We describe
More informationTRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies
TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies François Yvon and Yong Xu and Marianna Apidianaki LIMSI, CNRS, Université Paris-Saclay 91 403 Orsay {yvon,yong,marianna}@limsi.fr
More informationConvergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationOn the practice of error analysis for machine translation evaluation
On the practice of error analysis for machine translation evaluation Sara Stymne, Lars Ahrenberg Linköping University Linköping, Sweden {sara.stymne,lars.ahrenberg}@liu.se Abstract Error analysis is a
More informationFactored bilingual n-gram language models for statistical machine translation
Mach Translat DOI 10.1007/s10590-010-9082-5 Factored bilingual n-gram language models for statistical machine translation Josep M. Crego François Yvon Received: 2 November 2009 / Accepted: 12 June 2010
More informationImproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
mproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora Rajdeep Gupta, Santanu Pal, Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University Kolata
More informationA Hybrid System for Patent Translation
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy A Hybrid System for Patent Translation Ramona Enache Cristina España-Bonet Aarne Ranta Lluís Màrquez Dept. of Computer Science and
More informationAn Empirical Study on Web Mining of Parallel Data
An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1, Chi-Ho Li 2, Ming Zhou 2 and Hae-Chang Rim 1 1 Department of Computer Science & Engineering, Korea University {gwhong,rim}@nlp.korea.ac.kr
More informationBuilding task-oriented machine translation systems
Building task-oriented machine translation systems Germán Sanchis-Trilles Advisor: Francisco Casacuberta Pattern Recognition and Human Language Technologies Group Departamento de Sistemas Informáticos
More informationSystematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationMoses on Windows 7. Amittai Axelrod v1.0, 2011.07.15
Moses on Windows 7 Amittai Axelrod v1.0, 2011.07.15 This is a guide to installing and compiling the Moses machine translation framework (stable release dated 2010-08-13) on a Windows 7 machine running
More informationPBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? JANUARY 2009 1 15. Grammar based statistical MT on Hadoop
PBML logo The Prague Bulletin of Mathematical Linguistics NUMBER??? JANUARY 2009 1 15 Grammar based statistical MT on Hadoop An end-to-end toolkit for large scale PSCFG based MT Ashish Venugopal, Andreas
More informationJOINING HANDS: DEVELOPING A SIGN LANGUAGE MACHINE TRANSLATION SYSTEM WITH AND FOR THE DEAF COMMUNITY
Conference & Workshop on Assistive Technologies for People with Vision & Hearing Impairments Assistive Technology for All Ages CVHI 2007, M.A. Hersh (ed.) JOINING HANDS: DEVELOPING A SIGN LANGUAGE MACHINE
More informationThe United Nations Parallel Corpus v1.0
The United Nations Parallel Corpus v1.0 Michał Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen United Nations, DGACM, New York, United States of America Adam Mickiewicz University, Poznań, Poland World
More informationConvergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB Scotland, United Kingdom pkoehn@inf.ed.ac.uk Jean Senellart
More informationAdaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation Mu Li Microsoft Research Asia muli@microsoft.com Dongdong Zhang Microsoft Research Asia dozhang@microsoft.com
More informationGenerating Chinese Classical Poems with Statistical Machine Translation Models
Proceedings of the Twenty-Sixth AAA Conference on Artificial ntelligence Generating Chinese Classical Poems with Statistical Machine Translation Models Jing He* Ming Zhou Long Jiang Tsinghua University
More informationEvaluating a Machine Translation System in a Technical Support Scenario
Evaluating a Machine Translation System in a Technical Support Scenario Rosa Del Gaudio, Aljoscha Burchardt and Arle Lommel Higher Functions Sistemas Inteligentes Lisbon, Portugal rosa.gaudio@pcmedic.pt
More informationA Flexible Online Server for Machine Translation Evaluation
A Flexible Online Server for Machine Translation Evaluation Matthias Eck, Stephan Vogel, and Alex Waibel InterACT Research Carnegie Mellon University Pittsburgh, PA, 15213, USA {matteck, vogel, waibel}@cs.cmu.edu
More informationOersetter: Frisian-Dutch Statistical Machine Translation
Oersetter: Frisian-Dutch Statistical Machine Translation Maarten van Gompel and Antal van den Bosch, Centre for Language Studies, Radboud University Nijmegen and Anne Dykstra, Fryske Akademy > Abstract
More informationStatistical Machine Translation prototype using UN parallel documents
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Statistical Machine Translation prototype using UN parallel documents Bruno Pouliquen, Christophe Mazenc World Intellectual Property
More informationGenetic Algorithm-based Multi-Word Automatic Language Translation
Recent Advances in Intelligent Information Systems ISBN 978-83-60434-59-8, pages 751 760 Genetic Algorithm-based Multi-Word Automatic Language Translation Ali Zogheib IT-Universitetet i Goteborg - Department
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationDomain Adaptation for Medical Text Translation Using Web Resources
Domain Adaptation for Medical Text Translation Using Web Resources Yi Lu, Longyue Wang, Derek F. Wong, Lidia S. Chao, Yiming Wang, Francisco Oliveira Natural Language Processing & Portuguese-Chinese Machine
More informationStatistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
More informationModelling Pronominal Anaphora in Statistical Machine Translation
Modelling Pronominal Anaphora in Statistical Machine Translation Christian Hardmeier and Marcello Federico Fondazione Bruno Kessler Human Language Technologies Via Sommarive, 18 38123 Trento, Italy {hardmeier,federico}@fbk.eu
More informationA Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart Introduction Generation
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More information2-3 Automatic Construction Technology for Parallel Corpora
2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large
More informationMachine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
More informationDomain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Domain Adaptation of Statistical Machine Translation using Web-Crawled Resources: A Case Study Pavel Pecina 1, Antonio Toral 2, Vassilis
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationStructural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages Hiroshi URAE, Taro TEZUKA, Fuminori KIMURA, and Akira MAEDA Abstract Translating webpages by machine translation is the
More informationHow To Build A Machine Translation Engine On A Web Service (97 106)
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 97 106 ScaleMT: a Free/Open-Source Framework for Building Scalable Machine Translation Web Services Víctor M. Sánchez-Cartagena, Juan
More informationPhrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.
Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch
More informationLeveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training
Leveraging Large Amounts of Loosely Transcribed Corporate Videos for Acoustic Model Training Matthias Paulik and Panchi Panchapagesan Cisco Speech and Language Technology (C-SALT), Cisco Systems, Inc.
More informationThe CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments
The CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments Andreas Zollmann, Ashish Venugopal, Stephan Vogel interact, Language Technology Institute School of Computer Science
More informationApplication of Machine Translation in Localization into Low-Resourced Languages
Application of Machine Translation in Localization into Low-Resourced Languages Raivis Skadiņš 1, Mārcis Pinnis 1, Andrejs Vasiļjevs 1, Inguna Skadiņa 1, Tomas Hudik 2 Tilde 1, Moravia 2 {raivis.skadins;marcis.pinnis;andrejs;inguna.skadina}@tilde.lv,
More informationLetsMT!: A Cloud-Based Platform for Do-It-Yourself Machine Translation
LetsMT!: A Cloud-Based Platform for Do-It-Yourself Machine Translation Andrejs Vasiļjevs Raivis Skadiņš Jörg Tiedemann TILDE TILDE Uppsala University Vienbas gatve 75a, Riga Vienbas gatve 75a, Riga Box
More informationChapter 6. Decoding. Statistical Machine Translation
Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error
More informationUsing Wikipedia to Translate OOV Terms on MLIR
Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN
More informationA New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationDependency Graph-to-String Translation
Dependency Graph-to-String Translation Liangyou Li Andy Way Qun Liu ADAPT Centre, School of Computing Dublin City University {liangyouli,away,qliu}@computing.dcu.ie Abstract Compared to tree grammars,
More informationTowards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation Pavel Pecina, Antonio Toral, Andy Way School of Computing Dublin City Universiy Dublin 9, Ireland {ppecina,atoral,away}@computing.dcu.ie
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationSentence Level Dialect Identification for Machine Translation System Selection
Sentence Level Dialect Identification for Machine Translation System Selection Wael Salloum, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash and Mona Diab Center for Computational Learning Systems, Columbia
More informationThe CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments
The CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments Andreas Zollmann, Ashish Venugopal, Stephan Vogel interact, Language Technology Institute School of Computer Science
More informationVisualizing Data Structures in Parsing-based Machine Translation. Jonathan Weese, Chris Callison-Burch
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 127 136 Visualizing Data Structures in Parsing-based Machine Translation Jonathan Weese, Chris Callison-Burch Abstract As machine
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationUsing Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation
Using Cross-Lingual Explicit Semantic Analysis for Improving Ontology Translation Kar tik Asoo ja 1 Jor ge Gracia 1 N itish Ag garwal 2 Asunción Gómez Pérez 1 (1) Ontology Engineering Group, UPM, Madrid,
More informationSEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationPhrase-Based Statistical Translation of Programming Languages
Phrase-Based Statistical Translation of Programming Languages Svetoslav Karaivanov ETH Zurich svskaraivanov@gmail.com Veselin Raychev ETH Zurich veselin.raychev@inf.ethz.ch Martin Vechev ETH Zurich martin.vechev@inf.ethz.ch
More information