Convergence of Translation Memory and Statistical Machine Translation
|
|
- Bathsheba Jordan
- 8 years ago
- Views:
Transcription
1 Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010
2 Progress in Translation Automation 1 Translation Memory (TM) translators store past translation in database when translating new text, consult database for similar segments fuzzy match score defines similarity widely used by translation agencies
3 Progress in Translation Automation 1 Translation Memory (TM) translators store past translation in database when translating new text, consult database for similar segments fuzzy match score defines similarity widely used by translation agencies Statistical Machine Translation (SMT) collect large quantities of translated text extract automatically probabilistic translation rules when translating new text, find most probable translation given rules wide use of free web-based services not yet used by many translation agencies
4 TM vs. SMT 2 used by human translator restricted domain (e.g. product manual) very repetitive content used by target language information seeker open domain translation (e.g. news) huge diversity (esp. web) corpus size: corpus size: 1 million words million words commercial developers (e.g., SDL Trados) academic/commercial research (e.g., Google)
5 Our Goal 3 Better TM using SMT methods
6 Main Idea 4 Input The second paragraph of Article 21 is deleted. Fuzzy match in translation memory The second paragraph of Article 5 is deleted. Part of the translation from TM fuzzy match Part of the translation with SMT The second paragraph of Article 21 is deleted.
7 Related Work 5 Work inspired by EBMT [Smith and Clark, 2009] [Zhechev and van Genabith, 2010] uses syntactic information in alignment lower performance than reported here Encode fuzzy match as rule with gaps [Biçici and Dymetman, 2008] similar to our second method impressive improvements, but weak baseline SMT
8 Two Solutions 6 XML frames Very large hierarchical rules
9 Example 7 Input sentence: The second paragraph of Article 21 is deleted.
10 Example 8 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé.
11 Example 9 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. Detect mismatch (string edit distance)
12 Example 10 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. Detect mismatch (string edit distance) Align mismatch (using word alignment from giza++)
13 Example 11 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. Output word(s) taken from the target TM
14 Example 12 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. Output word(s) taken from the target TM Input word(s) that still need to be translated by SMT
15 Example 13 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. XML frame (input to Moses) <xml translation=" À l article "/> 21 <xml translation=", le texte du deuxiéme alinéa est supprimé. "/>
16 Example 14 Input sentence: The second paragraph of Article 21 is deleted. Fuzzy match in translation memory: The second paragraph of Article 5 is deleted. = À l article 5, le texte du deuxiéme alinéa est supprimé. More compact formalism for the purposes of this presentation: < À l article > 21 <, le texte du deuxiéme alinéa est supprimé. >
17 Steps 15 Fuzzy matching
18 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010]
19 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010] straight-forward
20 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010] straight-forward Word alignment of TM source and target
21 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010] straight-forward Word alignment of TM source and target standard method
22 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010] straight-forward Word alignment of TM source and target standard method Construction of XML frame = linking mismatch( input, TM source ) to TM target
23 Steps 15 Fuzzy matching based on string edit distance on words FMS = 1 edit-distance(source, tm-source) max( source, tm-source ) string edit distance on letters as tie breaker details see [Koehn and Senellart, AMTA 2010] straight-forward Word alignment of TM source and target standard method Construction of XML frame = linking mismatch( input, TM source ) to TM target can be tricky
24 Construction of XML Frame 16 Basic principles start with fully specified XML frame all mismatched source words have to be translated by SMT all TM target words aligned to mismatched TM source words are removed if the alignment to the TM target words fails go to the previous TM source word and follow its alignment See paper for algorithm
25 Example 17 Source String Edit TM Source The second paragraph of Article 21 is deleted. The second paragraph of Article 5 is deleted. Word Alignment TM Target XML Frame À l' article 5, le texte du deuxième alinéa est supprimé. <À l' article> 21 <, le texte du deuxième alinéa est supprimé.>
26 Special Case: Insertion 18 Source String Edit TM Source the big fish the fish Word Alignment TM Target XML Frame les poissons <les> big <poissons>
27 Special Case: Deletion 19 the fish the big fish les gros poissons <les> <poissons>
28 Special Case: Unaligned Mismatch 20 the green fish the big fish les poissons <les> green <poissons>
29 Special Case: Disconnected Alignments 21 joe will eat joe does not eat joe ne mange pas <joe> will <mange>
30 Experiments 22 Baseline 1: Unmodified TM matches Baseline 2: SMT system trained on TM data Our XML frame method
31 Corpora: Size 23 Acquis Corpus Test segments 1,165,867 4,107 English words 24,069, ,261 French words 25,533, ,224 Product Corpus Test segments 83,461 2,000 English words 1,038,762 24,643 French words 1,110,284 26,248
32 Corpora: Fuzzy Matches Acquis Sentences Words W/S 100% , % , % 154 5, % 250 6, Product Sentences Words W/S 95-99% 230 3, % 225 2, % 177 2, % 185 1, % 152 1, %
33 Results: Acquis 25
34 Results: Product 26
35 Recap 27 TM provides fuzzy matches SMT translates mismatched words
36 Recap 27 TM provides fuzzy matches SMT translates mismatched words TM match encoded in XML frame
37 Recap 27 TM provides fuzzy matches SMT translates mismatched words TM match encoded in XML frame... but is that not just a very large translation rule?
38 Background: Hierarchical Phrase Rules 28 Given: sentence pair with monotone 1-to-1 alignment the big fish = les gros poissons
39 Background: Hierarchical Phrase Rules 28 Given: sentence pair with monotone 1-to-1 alignment the big fish = les gros poissons Phrase translation rules ( the ; les ) ( the big ; les gros ) ( the big fish ; les gros poissons ) ( big ; gros ) ( big fish ; gros poissons ) ( fish ; poissons )
40 Background: Hierarchical Phrase Rules 28 Given: sentence pair with monotone 1-to-1 alignment the big fish = les gros poissons Phrase translation rules ( the ; les ) ( the big ; les gros ) ( the big fish ; les gros poissons ) ( big ; gros ) ( big fish ; gros poissons ) ( fish ; poissons ) Hierarchical phrase-based rule are constructed by subtraction
41 Background: Hierarchical Phrase Rules 28 Given: sentence pair with monotone 1-to-1 alignment the big fish = les gros poissons Phrase translation rules ( the ; les ) ( the big ; les gros ) ( the big fish ; les gros poissons ) ( big ; gros ) ( big fish ; gros poissons ) ( fish ; poissons ) Hierarchical phrase-based rule are constructed by subtraction large rule: ( the big fish ; les gros poissons ) small rule: ( big ; gros ) (contained in large rule)
42 Background: Hierarchical Phrase Rules 28 Given: sentence pair with monotone 1-to-1 alignment the big fish = les gros poissons Phrase translation rules ( the ; les ) ( the big ; les gros ) ( the big fish ; les gros poissons ) ( big ; gros ) ( big fish ; gros poissons ) ( fish ; poissons ) Hierarchical phrase-based rule are constructed by subtraction large rule: ( the big fish ; les gros poissons ) small rule: ( big ; gros ) (contained in large rule) hierarchical rule: ( the x fish ; les x poissons )
43 XML Frame as Very Large Rule 29 XML frame <À for input l article> 21 <, le texte du deuxiéme alinéa est supprimé.> The second paragraph of Article 21 is deleted. Very large rule ( The second paragraph of Article x is deleted. ; À l article x, le texte du deuxiéme alinéa est supprimé. )
44 Very Large Rules in SMT 30 Rule size limited in SMT maximum number of words, e.g. 5 maximum number of non-terminals (x), e.g but only due to storage limitations for large rule rule tables Rules may be generated on the fly [Lopez, 2007]
45 Advantage over XML Method 31 Choices 1. multiple fuzzy matches in TM with same score 2. same TM source with multiple translations 3. multiple SMT translations
46 Advantage over XML Method 31 Choices 1. multiple fuzzy matches in TM with same score 2. same TM source with multiple translations 3. multiple SMT translations Decisions in XML frame method 1. randomly chosen 2. most frequent 3. highest model score (tried others, see paper)
47 Advantage over XML Method 31 Choices 1. multiple fuzzy matches in TM with same score 2. same TM source with multiple translations 3. multiple SMT translations Decisions in XML frame method 1. randomly chosen 2. most frequent 3. highest model score (tried others, see paper) Decisions for very large rules 1. all 2. all 3. integrated scoring of VLR rules and basic translation rules (tunable)
48 Result: Acquis 32
49 Future Work: User Studies 33 Significant increases in bleu To do: validation in user studies Additional benefit: possible to highlight mismatch in translation input The second paragraph of Article 21 is deleted. suggested translation À l article 21, le texte du deuxiéme alinéa est supprimé.
50 Thank You 34 questions?
Convergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB Scotland, United Kingdom pkoehn@inf.ed.ac.uk Jean Senellart
More informationTranslation and Localization Services
Translation and Localization Services Company Overview InterSol, Inc., a California corporation founded in 1996, provides clients with international language solutions. InterSol delivers multilingual solutions
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationStatistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models
p. Statistical Machine Translation Lecture 4 Beyond IBM Model 1 to Phrase-Based Models Stephen Clark based on slides by Philipp Koehn p. Model 2 p Introduces more realistic assumption for the alignment
More informationChapter 5. Phrase-based models. Statistical Machine Translation
Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models translate words as atomic units Phrase-Based Models translate phrases as atomic units Advantages: many-to-many
More informationACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.
ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation
More informationA Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart Introduction Generation
More informationComputer Aided Translation
Computer Aided Translation Philipp Koehn 30 April 2015 Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationQuestion template for interviews
Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationHIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationA New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
More informationMachine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
More informationHybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationThe TCH Machine Translation System for IWSLT 2008
The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental
More informationMachine Translation at the European Commission
Directorate-General for Translation Machine Translation at the European Commission Konferenz 10 Jahre Verbmobil Saarbrücken, 16. November 2010 Andreas Eisele Project Manager Machine Translation, ICT Unit
More informationGlossary of translation tool types
Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze
More informationWhy Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?
Why Evaluation? How good is a given system? Machine Translation Evaluation Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better?
More informationSystematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation
Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies
More informationMachine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!
Why Evaluation? How good is a given system? Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better? But MT evaluation is a di cult
More informationInvestigating Repetition and Reusability of Translations in Subtitle Corpora for Use with Example-Based Machine Translation 1
Investigating Repetition and Reusability of Translations in Subtitle Corpora for Use with Example-Based Machine Translation 1 Marian Flanagan 2 and Dorothy Kenny 2 1. Introduction Repetition and reusability
More informationThe KIT Translation system for IWSLT 2010
The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationAdaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation Mu Li Microsoft Research Asia muli@microsoft.com Dongdong Zhang Microsoft Research Asia dozhang@microsoft.com
More informationTRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
More informationTranslation Solution for
Translation Solution for Case Study Contents PROMT Translation Solution for PayPal Case Study 1 Contents 1 Summary 1 Background for Using MT at PayPal 1 PayPal s Initial Requirements for MT Vendor 2 Business
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit
The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationThe University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation
The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics
More informationJOB BANK TRANSLATION AUTOMATED TRANSLATION SYSTEM. Table of Contents
JOB BANK TRANSLATION AUTOMATED TRANSLATION SYSTEM Job Bank for Employers Creating a Job Offer Table of Contents Building the Automated Translation System Integration Steps Automated Translation System
More informationStatistical Pattern-Based Machine Translation with Statistical French-English Machine Translation
Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation Jin'ichi Murakami, Takuya Nishimura, Masato Tokuhisa Tottori University, Japan Problems of Phrase-Based
More informationAn Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation Robert C. Moore Chris Quirk Microsoft Research Redmond, WA 98052, USA {bobmoore,chrisq}@microsoft.com
More informationCollaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin
More informationRecent developments in machine translation policy at the European Patent Office
Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent
More informationSpotting false translation segments in translation memories
Spotting false translation segments in translation memories Abstract Eduard Barbu Translated.net eduard@translated.net The problem of spotting false translations in the bi-segments of translation memories
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 102 OCTOBER 2014 17 26. A Fast and Simple Online Synchronous Context Free Grammar Extractor
The Prague Bulletin of Mathematical Linguistics NUMBER 102 OCTOBER 2014 17 26 A Fast and Simple Online Synchronous Context Free Grammar Extractor Paul Baltescu, Phil Blunsom University of Oxford, Department
More informationLeveraging ASEAN Economic Community through Language Translation Services
Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)
More informationT U R K A L A T O R 1
T U R K A L A T O R 1 A Suite of Tools for Augmenting English-to-Turkish Statistical Machine Translation by Gorkem Ozbek [gorkem@stanford.edu] Siddharth Jonathan [jonsid@stanford.edu] CS224N: Natural Language
More informationMachine Translation and the Translator
Machine Translation and the Translator Philipp Koehn 8 April 2015 About me 1 Professor at Johns Hopkins University (US), University of Edinburgh (Scotland) Author of textbook on statistical machine translation
More informationPhrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.
Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Translational Equivalence Er hat die Prüfung bestanden, jedoch
More informationApplication of Machine Translation in Localization into Low-Resourced Languages
Application of Machine Translation in Localization into Low-Resourced Languages Raivis Skadiņš 1, Mārcis Pinnis 1, Andrejs Vasiļjevs 1, Inguna Skadiņa 1, Tomas Hudik 2 Tilde 1, Moravia 2 {raivis.skadins;marcis.pinnis;andrejs;inguna.skadina}@tilde.lv,
More informationStatistical NLP Spring 2008. Machine Translation: Examples
Statistical NLP Spring 2008 Lecture 11: Word Alignment Dan Klein UC Berkeley Machine Translation: Examples 1 Machine Translation Madame la présidente, votre présidence de cette institution a été marquante.
More informationBuilding a Web-based parallel corpus and filtering out machinetranslated
Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract
More informationMulti-Engine Machine Translation by Recursive Sentence Decomposition
Multi-Engine Machine Translation by Recursive Sentence Decomposition Bart Mellebeek Karolina Owczarzak Josef van Genabith Andy Way National Centre for Language Technology School of Computing Dublin City
More informationIntegra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013
Integra(on of human and machine transla(on Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Motivation Human translation (HT) worldwide demand for translation services has accelerated,
More informationSDL International Localization Services Overview for GREE Game Developers
SDL International Localization Services Overview for GREE Game Developers Executive Summary SDL International (www.sdl.com) delivers world-class global gaming content, ensuring the nuance and cultural
More informationAn Empirical Study on Web Mining of Parallel Data
An Empirical Study on Web Mining of Parallel Data Gumwon Hong 1, Chi-Ho Li 2, Ming Zhou 2 and Hae-Chang Rim 1 1 Department of Computer Science & Engineering, Korea University {gwhong,rim}@nlp.korea.ac.kr
More informationMachine Translation Computer Aided Translation Machine Language Processing
Machine Translation Computer Aided Translation Machine Language Processing Martin Kappus (kapm@zhaw.ch) 1 Machine Translation Computer-Aided Translation Agenda Machine Translation Introduction History
More informationImproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora
mproving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora Rajdeep Gupta, Santanu Pal, Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University Kolata
More informationDetecting semantic overlap: Announcing a Parallel Monolingual Treebank for Dutch
Detecting semantic overlap: Announcing a Parallel Monolingual Treebank for Dutch Erwin Marsi & Emiel Krahmer CLIN 2007, Nijmegen What is semantic overlap? Zangeres Christina Aguilera heeft eindelijk verteld
More informationA Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani Advisors: Alexander Fraser and Helmut Schmid Institute for Natural Language Processing University of Stuttgart Machine Translation
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationAdaptation to Hungarian, Swedish, and Spanish
www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,
More informationUEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT
UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval
More informationDevelopment of Translation Memory Database System for Law Translation
DEVELOPMENT OF TRANSLATION MEMORY 1 Development of Translation Memory Database System for Law Translation Yasuhiro Sekine ad, Yasuhiro Ogawa bd, Katsuhiko Toyama bd, Yoshiharu Matsuura cd a CRESTEC Inc.,
More informationIntroduction. Philipp Koehn. 28 January 2016
Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)
More informationMoses from the point of view of an LSP: The Trusted Translations Experience
Moses from the point of view of an LSP: The Trusted Translations Experience Sunday 25 March 13:30-17:30 Gustavo Lucardi COO Trusted Translations, Inc. @glucardi An On-going Story Not a Success Story (Yet)
More informationtance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same
1222 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 7, SEPTEMBER 2008 System Combination for Machine Translation of Spoken and Written Language Evgeny Matusov, Student Member,
More informationAnubis - speeding up Computer-Aided Translation
Anubis - speeding up Computer-Aided Translation Rafał Jaworski Adam Mickiewicz University Poznań, Poland rjawor@amu.edu.pl Abstract. In this paper, the idea of Computer-Aided Translation is first introduced
More informationProject Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, 2014. Pangeanic - BI-Europe
Project Management From industrial perspective A. Helle M. Herranz Pangeanic - BI-Europe EXPERT Summer School, 2014 Outline 1 Introduction 2 3 Translation project management without MT Translation project
More informationDomain-specific terminology extraction for Machine Translation. Mihael Arcan
Domain-specific terminology extraction for Machine Translation Mihael Arcan Outline Phd topic Introduction Resources Tools Multi Word Extraction (MWE) extraction Projection of MWE Evaluation Future Work
More informationCompleting an Accounts Payable Audit With ACL (Aired on Feb 15)
AuditSoftwareVideos.com Video Training Titles (ACL Software Sessions Only) Contents Completing an Accounts Payable Audit With ACL (Aired on Feb 15)... 1 Statistical Analysis in ACL The Analyze Menu (Aired
More informationAdapting General Models to Novel Project Ideas
The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe
More informationKEZBER CONTENT MANAGEMENT SYSTEM MANUAL
KEZBER CONTENT MANAGEMENT SYSTEM MANUAL Page 1 Kezber Table Content Table Content 1. Introduction/Login... 3 2. Editing General Content... 4 to 8 2.1 Navigation General Content Pages... Error! Bookmark
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationTowards a General and Extensible Phrase-Extraction Algorithm
Towards a General and Extensible Phrase-Extraction Algorithm Wang Ling, Tiago Luís, João Graça, Luísa Coheur and Isabel Trancoso L 2 F Spoken Systems Lab INESC-ID Lisboa {wang.ling,tiago.luis,joao.graca,luisa.coheur,imt}@l2f.inesc-id.pt
More informationXTM for Language Service Providers Explained
XTM for Language Service Providers Explained 1. Introduction There is a new generation of Computer Assisted Translation (CAT) tools available based on the latest Web 2.0 technology. These systems are more
More informationThe Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud
The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan
More informationProvalis Research Text Analytics and the Victory Index
point Provalis Research Text Analytics and the Victory Index Fern Halper, Ph.D. Fellow Daniel Kirsch Senior Analyst Provalis Research Text Analytics and the Victory Index Unstructured data is everywhere
More informationAn End-to-End Discriminative Approach to Machine Translation
An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
More informationLanguage and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
More informationUsing Author-it Localization Manager
Using Author-it Localization Manager Prepared 2 September, 2011 Copyright 1996-2011 Author-it Software Corporation Ltd. All rights reserved Due to continued product development, this information may change
More informationThe SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
More information2-3 Automatic Construction Technology for Parallel Corpora
2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large
More information7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan
7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation
More informationContent Management & Translation Management
Content Management & Translation Management Michael Hoch Business Consulting SDL TRADOS Technologies @ 1. European RedDot User Conference London/Stansted AGENDA SDL TRADOS Technologies Some Terminology:
More informationStatistical Machine Translation prototype using UN parallel documents
Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Statistical Machine Translation prototype using UN parallel documents Bruno Pouliquen, Christophe Mazenc World Intellectual Property
More informationKantanMT.com. www.kantanmt.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.
KantanMT.com No Hardware. No Software. No Hassle MT. The world s #1 MT Platform Communicate globally, easily! Create customized language solutions in the cloud. www.kantanmt.com What is KantanMT.com? KantanMT
More informationPhrase-Based Statistical Translation of Programming Languages
Phrase-Based Statistical Translation of Programming Languages Svetoslav Karaivanov ETH Zurich svskaraivanov@gmail.com Veselin Raychev ETH Zurich veselin.raychev@inf.ethz.ch Martin Vechev ETH Zurich martin.vechev@inf.ethz.ch
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationEvaluation of speech technologies
CLARA Training course on evaluation of Human Language Technologies Evaluations and Language resources Distribution Agency November 27, 2012 Evaluation of speaker identification Speech technologies Outline
More informationFactored Translation Models
Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationQuantifying the Influence of MT Output in the Translators Performance: A Case Study in Technical Translation
Quantifying the Influence of MT Output in the Translators Performance: A Case Study in Technical Translation Marcos Zampieri Saarland University Saarbrücken, Germany mzampier@uni-koeln.de Mihaela Vela
More informationCorpus Design for a Unit Selection Database
Corpus Design for a Unit Selection Database Norbert Braunschweiler Institute for Natural Language Processing (IMS) Stuttgart 8 th 9 th October 2002 BITS Workshop, München Norbert Braunschweiler Corpus
More informationThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationFactored bilingual n-gram language models for statistical machine translation
Mach Translat DOI 10.1007/s10590-010-9082-5 Factored bilingual n-gram language models for statistical machine translation Josep M. Crego François Yvon Received: 2 November 2009 / Accepted: 12 June 2010
More informationAutomatic Network Protocol Analysis
Gilbert Wondracek, Paolo M ilani C omparetti, C hristopher Kruegel and E ngin Kirda {gilbert,pmilani}@ seclab.tuwien.ac.at chris@ cs.ucsb.edu engin.kirda@ eurecom.fr Reverse Engineering Network Protocols
More informationSCHOOL OF ENGINEERING AND INFORMATION TECHNOLOGIES GRADUATE PROGRAMS
INSTITUTO TECNOLÓGICO Y DE ESTUDIOS SUPERIORES DE MONTERREY CAMPUS MONTERREY SCHOOL OF ENGINEERING AND INFORMATION TECHNOLOGIES GRADUATE PROGRAMS DOCTOR OF PHILOSOPHY in INFORMATION TECHNOLOGIES AND COMMUNICATIONS
More informationwww.sdl.com SDL Trados Studio 2015 Project Management Quick Start Guide
www.sdl.com SDL Trados Studio 2015 Project Management Quick Start Guide SDL Trados Studio 2015 Project Management Quick Start Guide Copyright Information Copyright 2011-2015 SDL Group. Nothing contained
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationTranslation Management Systems Explained
ebook Translation Management Systems Explained Reduce the cost and administration of your language translation process. 2 Table of Contents 03 Introduction 04 Language Translation Opens New Markets 10
More informationAutomated Online English -Arabic Translator
Faculty of Engineering & Architecture Electrical & Computer Engineering Department Final Year Project - Spring 2006 Automated Online English -Arabic Translator Supervisor: Prof. Ali El Hajj Report Grader:
More information