A Joint Sequence Translation Model with Integrated Reordering



Similar documents
A Joint Sequence Translation Model with Integrated Reordering

Chapter 5. Phrase-based models. Statistical Machine Translation

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Convergence of Translation Memory and Statistical Machine Translation

Chapter 6. Decoding. Statistical Machine Translation

LIUM s Statistical Machine Translation System for IWSLT 2010

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Machine Translation. Agenda

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Statistical Machine Translation

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

The KIT Translation system for IWSLT 2010

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Computer Aided Translation

Factored bilingual n-gram language models for statistical machine translation

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

The TCH Machine Translation System for IWSLT 2008

The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation

Factored Markov Translation with Robust Modeling

Machine Translation and the Translator

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Adapting General Models to Novel Project Ideas

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Collaborative Machine Translation Service for Scientific texts

Leveraging ASEAN Economic Community through Language Translation Services

Factored Translation Models

An End-to-End Discriminative Approach to Machine Translation

THUTR: A Translation Retrieval System

Adaptation to Hungarian, Swedish, and Spanish

Hybrid Machine Translation Guided by a Rule Based System

Statistical Machine Translation

UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION

The Adjunct Network Marketing Model

Domain-specific terminology extraction for Machine Translation. Mihael Arcan

Machine Translation for Human Translators

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation Project no.

Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

Polish - English Statistical Machine Translation of Medical Texts.

SCHOOL OF ENGINEERING AND INFORMATION TECHNOLOGIES GRADUATE PROGRAMS

Arabic Recognition and Translation System

Towards a General and Extensible Phrase-Extraction Algorithm

Building a Web-based parallel corpus and filtering out machinetranslated

Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

Semantics in Statistical Machine Translation

Chinese-Japanese Machine Translation Exploiting Chinese Characters

Statistical Machine Translation: IBM Models 1 and 2

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Edinburgh s Phrase-based Machine Translation Systems for WMT-14

Introduction. Philipp Koehn. 28 January 2016

An Online Service for SUbtitling by MAchine Translation

Hybrid Strategies. for better products and shorter time-to-market

Customizing an English-Korean Machine Translation System for Patent Translation *

Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

Learning Translations of Named-Entity Phrases from Parallel Corpora

Corpus Design for a Unit Selection Database

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

Modelling Pronominal Anaphora in Statistical Machine Translation

Chapter 7. Language models. Statistical Machine Translation

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? JANUARY Grammar based statistical MT on Hadoop

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System

Comprendium Translator System Overview

Tuning Methods in Statistical Machine Translation

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

The United Nations Parallel Corpus v1.0

An Online Service for SUbtitling by MAchine Translation

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

Statistical NLP Spring Machine Translation: Examples

D4.3: TRANSLATION PROJECT- LEVEL EVALUATION

Visualizing Data Structures in Parsing-based Machine Translation. Jonathan Weese, Chris Callison-Burch

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

AMTA th Biennial Conference of the Association for Machine Translation in the Americas. San Diego, Oct 28 Nov 1, 2012

Scaling Shrinkage-Based Language Models

Portuguese Broadcast News and Statistical Machine Translation (SMT)

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Collecting Polish German Parallel Corpora in the Internet

Getting Off to a Good Start: Best Practices for Terminology

Deciphering Foreign Language

Factored Language Models for Statistical Machine Translation

T U R K A L A T O R 1

Predicting the Stock Market with News Articles

Scalable Inference and Training of Context-Rich Syntactic Translation Models

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Automatic Generation of Bid Phrases for Online Advertising

Search and Information Retrieval

Discovering process models from empirical data

Pre-processing of Bilingual Corpora for Mandarin-English EBMT

Translation Solution for

Dutch Parallel Corpus

Building task-oriented machine translation systems

Transcription:

A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Introduction Generation of bilingual sentence pair through a sequence of operations Operation: Translate or Reorder P (E,F,A) = Probability of the operation sequence required to generate the bilingual sentence pair Extension of N-gram based SMT Sequence of operations rather than tuples Integrated reordering rather than source linearization + rule extraction

Example Er hat eine Pizza gegessen He has eaten a pizza

Example Er hat eine Pizza gegessen He has eaten a pizza Simultaneous generation of source and target Generation is done in order of the target sentence Reorder when the source words are not in the same order

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Er He

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Er hat He has

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Er hat Insert gap He has

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Er hat gegessen Insert gap Generate gegessen eaten He has eaten

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Er hat gegessen Insert gap Generate gegessen eaten He has eaten Jump back

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Er hat eine Insert gap Generate gegessen eaten Jump back He has eaten a Generate eine a gegessen

Example Er hat eine Pizza gegessen He has eaten a pizza Operations Generate Er He Generate hat has Insert gap Generate gegessen eaten Jump back Generate eine a Generate Pizza pizza Er hat eine Pizza gegessen He has eaten a pizza

Lexical Trigger Er hat gegessen He has eaten Generate Er He Generate hat has Insert Gap Generate gegessen eat Jump Back

Generalizing to Unseen Context Er hat einen Erdbeerkuchen gegessen He has eaten a strawberry cake Generate Er-He Generate hat-has Insert Gap Generate gegessen-eat Jump Back(1) Generate einen-a Generate Erdbeerkuchen strawberry cake

Generalizing to Unseen Context Er hat einen Erdbeerkuchen und eine Menge Butterkekse gegessen He has eaten a strawberry cake and a lot of butter cookies Generate Er He Generate hat has Insert Gap Generate gegessen eat Jump Back(1) Generate einen a Generate Erdbeerkuchen strawberry cake Generate und and Generate eine a Generate Menge lot of Generate Butterkekse butter cookies

Key Ideas - Contributions Reordering integrated into translation model Translation and reordering decisions influence each other Handles local and long distance reorderings in a unified manner An operation model that accounts for: Translation Reordering Source-side gaps Source word deletion Joint model with bilingual information (like N-gram SMT) No spurious phrasal segmentation (like N-gram SMT) No distortion limit

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example Generate (gegessen, eaten) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example Generate (Inflationsraten, inflation rate) Inflationsraten Inflation rate 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward Example kehrten zurück returned Generate (kehrten zurück, returned) Insert Gap Continue Source Cept

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example Generate Identical instead of Generate (Portland, Portland) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward If count (Portland) = 1

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example kommen Sie mit come with me Generate Source Only (Sie) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward Gap # 1 do not nicht

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward Gap # 2 Gap # 1 nicht wollen do not want to Jump Back (1)!!!

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward Gap # 1 do not want to negotiate nicht verhandeln wollen

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Example über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward Gap # 1 nicht verhandeln wollen do not want to negotiate Jump Back (1)!!!

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) Jump Forward!!! 3 Reordering Operations Insert Gap über konkrete Zahlen nicht verhandeln wollen Jump Back (N) Jump Forward do not want to negotiate on specific figures

List of Operations 4 Translation Operations Generate (X,Y) Continue Source Cept Generate Identical Generate Source Only (X) 3 Reordering Operations Insert Gap Jump Back (N) Jump Forward über konkrete Zahlen nicht verhandeln wollen. do not want to negotiate on specific figures.

Learning Phrases through Operation Sequences über konkrete Zahlen nicht verhandeln wollen do not want to negotiate on specific figures Phrase Pair : nicht verhandeln wollen ~ do not want to negotiate Generate (nicht, do not) Insert Gap Generate (wollen, want to) Jump Back(1) Generate (verhandeln, negotiate)

Model Joint-probability model over operation sequences

Search Search is defined as: Incorporating language model 5-gram for the language model (p LM ) 9-gram for operation model and prior probability (p pr ) Stack based beam decoder which uses operations

Other Features

Other Features Length Penalty : Counts the number of target words produced

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted Gap Penalty : Counts the number of gaps inserted

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted Gap Penalty : Counts the number of gaps inserted Open Gap Penalty : Number of open gaps, paid once per each translation operation

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted Gap Penalty : Counts the number of gaps inserted Open Gap Penalty : Number of open gaps, paid once per each translation operation Reordering Distance : Distance from the last translated tuple

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted Gap Penalty : Counts the number of gaps inserted Open Gap Penalty : Number of open gaps, paid once per each translation operation Reordering Distance : Distance from the last translated tuple Gap Width : Distance from the first open gap

Other Features Length Penalty : Counts the number of target words produced Deletion Penalty : Counts the number of source words deleted Gap Penalty : Counts the number of gaps inserted Open Gap Penalty : Number of open gaps, paid once per each translation operation Reordering Distance : Distance from the last translated tuple Gap Width : Distance from the first open gap Lexical Probabilities : Source-to-Target and Target-to-Source lexical translation probabilities

Experimental Setup Language Pairs: German, Spanish and French to English Data 4 th Version of the Europarl Corpus Bilingual Data: 200K parallel sentences (reduced version of WMT 09) ~74K News commentary + ~ 126K Europarl Monolingual Data: 500K = 300K from the monolingual corpus (news commentary) + 200K English side of bilingual corpus Standard WMT 2009 sets for tuning and testing

Training & Tuning Giza++ for word alignment Heuristic modification of alignments to remove target-side gaps and unaligned target words (see the paper for details) Convert word-aligned bilingual corpus into operation corpus (see paper for details) SRI-Toolkit to train n-gram language models Kneser-Ney Smoothing Parameter Tuning with Z-mert

Results Baseline: Moses (with lexicalized reordering) with defaults A 5-gram language model (same as ours) Two baselines with no distortion limit and using a reordering limit 6 Two variations of our system Using no reordering limit Using gap-width of 6 as a reordering limit

Using Non-Gappy Source Cepts Source German Spanish French Bl no-rl 17.41 19.85 19.39 Bl rl-6 18.57 21.67 20.84 Tw no-rl 18.97 22.17 20.92 Tw rl-6 19.03 21.88 20.72 Moses score without reordering limit drops by more than a BLEU point Our best system Tw no-rl gives Statistically significant results over Bl rl-6 for German and Spanish Comparable results for French

Gappy + Non-Gappy Source Cepts Source German Spanish French Tw no-rl 18.97 22.17 20.92 19.03 21.88 20.72 Tw rl-6 Tw asg-no-rl 18.61 21.60 20.59 Tw asg-rl-6 18.65 21.40 20.47

Why didn t Gappy-Cepts improve performance? Using all source gaps explodes the search space Source German Spanish French Gaps 965,515 1,705,156 1,473,798 No Gaps 256,992 313,690 343,220 Number of tuples using 10-best translations

Why didn t Gappy-Cepts improve performance? Using all source gaps explodes the search space Source German Spanish French Gaps 965,515 1,705,156 1,473,798 No Gaps 256,992 313,690 343,220 Number of tuples using 10-best translations Future cost is incorrectly estimated in case of gappy cepts Dynamic programming algorithm for calculation of bigger spans doesn t apply anymore Modification but still problematic when gappy cepts interleave

Heuristic Use only the gappy cepts with scores better than sum of their parts log prob(habe gemacht made) > log p(habe have) + log p(gemacht made) Source German Spanish French Gaps 965,515 1,705,156 1,473,798 No Gaps 256,992 313,690 343,220 Heuristic 281,618 346,993 385,869

With Gappy Source Cepts + Heuristic Source Tw asg-no-rl Tw asg-rl-6 German 18.61 18.65 Spanish 21.60 21.40 French 20.59 20.47 Tw hsg-no-rl 18.91 21.93 20.87 Tw hsg-rl-6 19.23 21.79 20.75

Summary Translation and Reordering are combined into a single generative story Handles long and short distance reordering identically Ability to learn phrases through operation sequence All possible reorderings (in contrast with N-gram SMT) Using bilingual context (like N-gram SMT) No spurious phrasal segmentation (like N-gram SMT) No distortion limit Compared with state-of-the-art Moses system Comparable results for French-to-English Significantly better results for German-to-English and Spanish-to-English

Thank you - Questions? Decoder and Corpus Conversion Algorithm available at: http://www.ims.uni-stuttgart.de/~durrani/resources.html

Future Work Improving Future Cost estimate Using phrases instead of tuples for future cost estimation N-gram Model and Phrase-based decoding Source-side discontinuities Future cost estimation with gappy units Gappy Phrases Improve the model to better handle source gas Target-side discontinuities Target unaligned words (Generate Target Only (Y) Operation) Generalizing the operation model using a combination of POS tags and lexical items

Search and Future Cost Estimation The search problem is much harder than in PBSMT Larger beam needed to produce translations similar to PBSMT Example zum Beispiel for example vs zum for, Beispiel example Problem with future cost estimation Language model probability Phrase based : p(for) * p(example for) Our Model : p(for) * p(example) Future Cost for reordering operations Future Cost for features gap penalty, gap-width and reordering distance

Future Cost Estimation with Source-Side Gaps Future Cost estimation with source side gaps is problematic Future Cost for Bigger Spans cost (I,K) = min( cost (I,J) + cost (J+1,K) ) for all J in I K cost (1,8) = min ( { cost (1,1) + cost (2,8) }, {cost (1,2) + cost (3,7)},, {cost(1,7) + cost(8,8)} 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps FC estimation with source side gaps is problematic Future Cost for Bigger Spans cost (I,K) = min( cost (I,J) + cost (J+1,K) ) for all J in I K cost (1,8) = min ( { cost (1,1) + cost (2,8) }, {cost (1,2) + cost (3,7)},, {cost(1,7) + cost(8,8)} 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps FC estimation with source side gaps is problematic Future Cost for Bigger Spans cost (I,K) = min( cost (I,J) + cost (J+1,K) ) for all J in I K cost (1,8) = min ( { cost (1,1) + cost (2,8) }, {cost (1,2) + cost (3,7)},, {cost(1,7) + cost(8,8)} 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Does not work for cepts with gaps Best way to cover word 1,4 and 8 is through cept 1..4..8 cost (1,8) =? After computation of cost (1,8) we do another pass to find min (cost (1,8), cost (2,3) + cost (5,7) + cost_of_cept (1..4..8) 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Does not work for cepts with gaps Best way to cover word 1,4 and 8 is through cept 1..4..8 cost (1,8) =? After computation of cost (1,8) we do another pass to find min (cost (1,8), cost (2,3) + cost (5,7) + cost_of_cept (1..4..8) 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Does not work for cepts with gaps Best way to cover word 1,4 and 8 is through cept 1..4..8 cost (1,8) =? After computation of cost (1,8) we do another pass to find min (cost (1,8), cost (2,3) + cost (5,7) + cost_of_cept (1..4..8) 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Does not work for cepts with gaps Best way to cover word 1,4 and 8 is through cept 1..4..8 cost (1,8) =? min (cost (1,8), cost (2,3) + cost (5,7) + cost_of_cept (1..4..8), cost (3,7) + cost_of_cept (1..2..8)) 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Still problematic when gappy Cepts interleave Example: Consider best way to cover 1 & 5 is through cept 1 5 Modification can not capture that best cost = cost_of_cept (1..5) + cost_of_cept(2 4...8) + cost (3,3) + cost (6,7) 1 2 3 4 5 6 7 8

Future Cost Estimation with Source-Side Gaps Gives incorrect cost if coverage vector already covers a word between the gappy cept 1 2 3 4 5 6 7 8 Decoder has covered 3 Future cost estimate cost (1,2) + cost (4,8) is wrong The correct estimate is cost_of_cept (1 4 8) + cost (2,2) + cost (5,8) No efficient way to cover all possible permutations

Target Side Gaps & Unaligned Words Our model does not allow target-side gaps and target unaligned words Post-editing of alignments a 3 step process Step-I: Remove all target-side gaps For a gappy alignment, link to least frequent target word is identified A group of link that contain this word is retained Example A B C D U V W X Y Z Target Side Discontinuity!!

Target Side Gaps & Unaligned Words Our model does not allow target-side gaps and target unaligned words Post-editing of alignments a 3 step process Step-I: Remove all target-side gaps For a gappy alignment, link to least frequent target word is identified A group of link that contain this word is retained Example A B C D U V W X Y Z

Target Side Gaps & Unaligned Words Our model does not allow target-side gaps and target unaligned words Post-editing of alignments a 3 step process Step-I: Remove all target-side gaps For a gappy alignment, link to least frequent target word is identified A group of link that contain this word is retained Example A B C D U V W X Y Z No target side gaps but target unaligned words!!!

Continued After Step-I A B C D U V W X Y Z Step-II: Counting over the training corpus to find the attachment preference of a word Count (U,V) = 1 Count (W,X) = 1 Count (W,X) = 1 Count (X,Y) = 0.5 Count (Y,Z) = 0.5

Continued Step-III: Attached target-unaligned words to right or left based on the collected counts After Step-III A B C D U V W X Y Z