Chapter 6. Decoding. Statistical Machine Translation



Similar documents
Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Chapter 5. Phrase-based models. Statistical Machine Translation

Statistical Machine Translation

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

A Joint Sequence Translation Model with Integrated Reordering

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

Machine Translation. Agenda

Statistical Machine Translation: IBM Models 1 and 2

Arithmetic Coding: Introduction

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Load Balancing. Load Balancing 1 / 24

An End-to-End Discriminative Approach to Machine Translation

Measuring the Performance of an Agent

IRIS - English-Irish Translation System

A Joint Sequence Translation Model with Integrated Reordering

Factored Translation Models

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Chapter 13: Binary and Mixed-Integer Programming

Statistical NLP Spring Machine Translation: Examples

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Data Structures and Algorithms V Otávio Braga

Design and Analysis of ACO algorithms for edge matching problems

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

each college c i C has a capacity q i - the maximum number of students it will admit

Statistical Machine Translation

This page intentionally left blank

Factoring & Primality

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

Least-Squares Intersection of Lines

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Smart Graphics: Methoden 3 Suche, Constraints

Computer Aided Translation

Guessing Game: NP-Complete?

Tagging with Hidden Markov Models

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

NP-complete? NP-hard? Some Foundations of Complexity. Prof. Sven Hartmann Clausthal University of Technology Department of Informatics

Informed search algorithms. Chapter 4, Sections 1 2 1

Discuss the size of the instance for the minimum spanning tree problem.

CMPSCI611: Approximating MAX-CUT Lecture 20

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Scenario: Optimization of Conference Schedule.

Globally Optimal Crowdsourcing Quality Management

Non-Inferiority Tests for One Mean

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

Introduction to Algorithms. Part 3: P, NP Hard Problems

Schedule Risk Analysis Simulator using Beta Distribution

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Predicting the Stock Market with News Articles

Machine Translation and the Translator

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Near Optimal Solutions

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Data quality in Accounting Information Systems

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

VEHICLE ROUTING PROBLEM

Data Mining Practical Machine Learning Tools and Techniques

Building a Web-based parallel corpus and filtering out machinetranslated

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

The Classes P and NP. mohamed@elwakil.net

Network (Tree) Topology Inference Based on Prüfer Sequence

An ACO/VNS Hybrid Approach for a Large-Scale Energy Management Problem

Statistical Machine Translation of French and German into English Using IBM Model 2 Greedy Decoding

Runtime Verification - Monitor-oriented Programming - Monitor-based Runtime Reflection

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Optimization of ETL Work Flow in Data Warehouse

Supervised Learning (Big Data Analytics)

THE SCHEDULING OF MAINTENANCE SERVICE

Resource Allocation and Scheduling

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes

Design of LDPC codes

Factored Language Models for Statistical Machine Translation

The Classes P and NP

Hybrid Machine Translation Guided by a Rule Based System

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Phrase-Based Statistical Translation of Programming Languages

Chapter 7. Language models. Statistical Machine Translation

- Once we have computed the costs of the individual components of the firm s financing,

Data Mining Analytics for Business Intelligence and Decision Support

Convergence of Translation Memory and Statistical Machine Translation

Study of algorithms for factoring integers and computing discrete logarithms

Dynamic Programming. Lecture Overview Introduction

CHAPTER 7: OPTIMAL RISKY PORTFOLIOS

Transcription:

Chapter 6 Decoding Statistical Machine Translation

Decoding We have a mathematical model for translation p(e f) Task of decoding: find the translation e best with highest probability Two types of error e best = argmax e p(e f) the most probable translation is bad fix the model search does not find the most probably translation fix the search Decoding is evaluated by search error, not quality of translations (although these are often correlated) Chapter 6: Decoding 1

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause Chapter 6: Decoding 2

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase in input, translate Chapter 6: Decoding 3

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not Pick phrase in input, translate it is allowed to pick words out of sequence reordering phrases may have multiple words: many-to-many translation Chapter 6: Decoding 4

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go Pick phrase in input, translate Chapter 6: Decoding 5

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Pick phrase in input, translate Chapter 6: Decoding 6

Computing Translation Probability Probabilistic model for phrase-based translation: e best = argmax e I φ( f i ē i ) d(start i end i 1 1) p lm (e) i=1 Score is computed incrementally for each partial hypothesis Components Phrase translation Picking phrase f i to be translated as a phrase ē i look up score φ( f i ē i ) from phrase translation table Reordering Previous phrase ended in end i 1, current phrase starts at start i compute d(start i end i 1 1) Language model For n-gram model, need to keep track of last n 1 words compute score p lm (w i w i (n 1),..., w i 1 ) for added words w i Chapter 6: Decoding 7

Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course, not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home Many translation options to choose from in Europarl phrase table: 2727 matching phrase pairs for this sentence by pruning to the top 20 per phrase, 202 translation options remain Chapter 6: Decoding 8

Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home The machine translation decoder does not know the right answer picking the right translation options arranging them in the right order Search problem solved by heuristic beam search Chapter 6: Decoding 9

Decoding: Precompute Translation Options er geht ja nicht nach hause consult phrase translation table for all input phrases Chapter 6: Decoding 10

Decoding: Start with Initial Hypothesis er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Chapter 6: Decoding 11

Decoding: Hypothesis Expansion er geht ja nicht nach hause are pick any translation option, create new hypothesis Chapter 6: Decoding 12

Decoding: Hypothesis Expansion er geht ja nicht nach hause he are it create hypotheses for all other translation options Chapter 6: Decoding 13

Decoding: Hypothesis Expansion er geht ja nicht nach hause yes he goes home are does not go home it to also create hypotheses from created partial hypothesis Chapter 6: Decoding 14

Decoding: Find Best Path er geht ja nicht nach hause yes he goes home are does not go home it to backtrack from highest scoring complete hypothesis Chapter 6: Decoding 15

Computational Complexity The suggested process creates exponential number of hypothesis Machine translation decoding is NP-complete Reduction of search space: recombination (risk-free) pruning (risky) Chapter 6: Decoding 16

Recombination Two hypothesis paths lead to two matching hypotheses same number of foreign words translated same English words in the output different scores it is it is Worse hypothesis is dropped it is Chapter 6: Decoding 17

Recombination Two hypothesis paths lead to hypotheses indistinguishable in subsequent search same number of foreign words translated same last two English words in output (assuming trigram language model) same last foreign word translated different scores he does not it does not Worse hypothesis is dropped he does not it Chapter 6: Decoding 18

Restrictions on Recombination Translation model: Phrase translation independent from each other no restriction to hypothesis recombination Language model: Last n 1 words used as history in n-gram language model recombined hypotheses must match in their last n 1 words Reordering model: Distance-based reordering model based on distance to end position of previous input phrase recombined hypotheses must have that same end position Other feature function may introduce additional restrictions Chapter 6: Decoding 19

Pruning Recombination reduces search space, but not enough (we still have a NP complete problem on our hands) Pruning: remove bad hypotheses early put comparable hypothesis into stacks (hypotheses that have translated same number of input words) limit number of hypotheses in each stack Chapter 6: Decoding 20

Stacks goes does not he are it yes no word translated one word translated two words translated three words translated Hypothesis expansion in a stack decoder translation option is applied to hypothesis new hypothesis is dropped into a stack further down Chapter 6: Decoding 21

Stack Decoding Algorithm 1: place empty hypothesis into stack 0 2: for all stacks 0...n 1 do 3: for all hypotheses in stack do 4: for all translation options do 5: if applicable then 6: create new hypothesis 7: place in stack 8: recombine with existing hypothesis if possible 9: prune stack if too big 10: end if 11: end for 12: end for 13: end for Chapter 6: Decoding 22

Pruning Pruning strategies histogram pruning: keep at most k hypotheses in each stack stack pruning: keep hypothesis with score α best score (α < 1) Computational time complexity of decoding with histogram pruning O(max stack size translation options sentence length) Number of translation options is linear with sentence length, hence: Quadratic complexity O(max stack size sentence length 2 ) Chapter 6: Decoding 23

Reordering Limits Limiting reordering to maximum reordering distance Typical reordering distance 5 8 words depending on language pair larger reordering limit hurts translation quality Reduces complexity to linear O(max stack size sentence length) Speed / quality trade-off by setting maximum stack size Chapter 6: Decoding 24

Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische tm:-1.16,lm:-2.93 d:0, all:-4.09 initiative initiative tm:-1.21,lm:-4.67 d:0, all:-5.88 the first time das erste mal tm:-0.56,lm:-2.81 d:-0.74. all:-4.11 both hypotheses translate 3 words worse hypothesis has better score Chapter 6: Decoding 25

Estimating Future Cost Future cost estimate: how expensive is translation of rest of sentence? Optimistic: choose cheapest translation options Cost for each translation option translation model: cost known language model: output words known, but not context estimate without context reordering model: unknown, ignored for future cost estimation Chapter 6: Decoding 26

Cost Estimates from Translation Options the tourism initiative addresses this for the first time -1.0-2.0-1.5-2.4-1.4-1.0-1.0-1.9-1.6-4.0-2.5-2.2-1.3-2.4-2.7-2.3-2.3-2.3 cost of cheapest translation options for each input span (log-probabilities) Chapter 6: Decoding 27

Cost Estimates for all Spans Compute cost estimate for all contiguous spans by combining cheapest options first future cost estimate for n words (from first) word 1 2 3 4 5 6 7 8 9 the -1.0-3.0-4.5-6.9-8.3-9.3-9.6-10.6-10.6 tourism -2.0-3.5-5.9-7.3-8.3-8.6-9.6-9.6 initiative -1.5-3.9-5.3-6.3-6.6-7.6-7.6 addresses -2.4-3.8-4.8-5.1-6.1-6.1 this -1.4-2.4-2.7-3.7-3.7 for -1.0-1.3-2.3-2.3 the -1.0-2.2-2.3 first -1.9-2.4 time -1.6 Function words cheaper (the: -1.0) than content words (tourism -2.0) Common phrases cheaper (for the first time: -2.3) than unusual ones (tourism initiative addresses: -5.9) Chapter 6: Decoding 28

Combining Score and Future Cost -6.1-9.3-6.9-2.2 the tourism initiative die touristische initiative tm:-1.21,lm:-4.67 d:0, all:-5.88-6.1 + the first time das erste mal -9.3 + this for... time für diese zeit -9.1 + -5.88 = -4.11 = -4.86 = tm:-0.56,lm:-2.81 tm:-0.82,lm:-2.98-11.98-13.41-13.96 d:-0.74. all:-4.11 d:-1.06. all:-4.86 Hypothesis score and future cost estimate are combined for pruning left hypothesis starts with hard part: the tourism initiative score: -5.88, future cost: -6.1 total cost -11.98 middle hypothesis starts with easiest part: the first time score: -4.11, future cost: -9.3 total cost -13.41 right hypothesis picks easy parts: this for... time score: -4.86, future cost: -9.1 total cost -13.96 Chapter 6: Decoding 29

Other Decoding Algorithms A* search Greedy hill-climbing Using finite state transducers (standard toolkits) Chapter 6: Decoding 30

A* Search probability + heuristic estimate cheapest score depth-first expansion to completed path number of words covered Uses admissible future cost heuristic: never overestimates cost Translation agenda: create hypothesis with lowest score + heuristic cost Done, when complete hypothesis created Chapter 6: Decoding 31

Greedy Hill-Climbing Create one complete hypothesis with depth-first search (or other means) Search for better hypotheses by applying change operators change the translation of a word or phrase combine the translation of two words into a phrase split up the translation of a phrase into two smaller phrase translations move parts of the output into a different position swap parts of the output with the output at a different part of the sentence Terminates if no operator application produces a better translation Chapter 6: Decoding 32

Summary Translation process: produce output left to right Translation options Decoding by hypothesis expansion Reducing search space recombination pruning (requires future cost estimate) Other decoding algorithms Chapter 6: Decoding 33