Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Similar documents
Chapter 6. Decoding. Statistical Machine Translation

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

Chapter 5. Phrase-based models. Statistical Machine Translation

Statistical Machine Translation

A Joint Sequence Translation Model with Integrated Reordering

Machine Translation. Agenda

Computer Aided Translation

A Joint Sequence Translation Model with Integrated Reordering

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

11/15/10 + NELL. Natural Language Processing. NELL: Never-Ending Language Learning

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Introduction. Philipp Koehn. 28 January 2016

Factored Translation Models

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

Factored Language Models for Statistical Machine Translation

Statistical Machine Translation: IBM Models 1 and 2

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Machine Translation and the Translator

IRIS - English-Irish Translation System

Convergence of Translation Memory and Statistical Machine Translation

Statistical Machine Translation

Visualizing Data Structures in Parsing-based Machine Translation. Jonathan Weese, Chris Callison-Burch

Semantics in Statistical Machine Translation

THUTR: A Translation Retrieval System

Hybrid Machine Translation Guided by a Rule Based System

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

LIUM s Statistical Machine Translation System for IWSLT 2010

An End-to-End Discriminative Approach to Machine Translation

The TCH Machine Translation System for IWSLT 2008

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

Scalable Inference and Training of Context-Rich Syntactic Translation Models

Building a Web-based parallel corpus and filtering out machinetranslated

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Machine Learning for natural language processing

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Overview of MT techniques. Malek Boualem (FT)

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Statistical Machine Translation

PBML. logo. The Prague Bulletin of Mathematical Linguistics NUMBER??? JANUARY Grammar based statistical MT on Hadoop

The University of Maryland Statistical Machine Translation System for the Fifth Workshop on Machine Translation

Modalverben Theorie. learning target. rules. Aim of this section is to learn how to use modal verbs.

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

The KIT Translation system for IWSLT 2010

Course: Model, Learning, and Inference: Lecture 5

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

The finite verb and the clause: IP

Adapting General Models to Novel Project Ideas

Adaptation to Hungarian, Swedish, and Spanish

Deciphering Foreign Language

SCHOOL OF ENGINEERING AND INFORMATION TECHNOLOGIES GRADUATE PROGRAMS

Integration of Content Optimization Software into the Machine Translation Workflow. Ben Gottesman Acrolinx

This page intentionally left blank

Segmentation and Classification of Online Chats

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Log-Linear Models a.k.a. Logistic Regression, Maximum Entropy Models

Learning Translations of Named-Entity Phrases from Parallel Corpora

CSCI 5417 Information Retrieval Systems Jim Martin!

Phrase-Based Statistical Translation of Programming Languages

The Prague Bulletin of Mathematical Linguistics NUMBER 102 OCTOBER A Fast and Simple Online Synchronous Context Free Grammar Extractor

Statistical NLP Spring Machine Translation: Examples

Free Online Translators:

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Arithmetic Coding: Introduction

Customizing an English-Korean Machine Translation System for Patent Translation *

Constituency. The basic units of sentence structure

Factored Markov Translation with Robust Modeling

Network Tomography Based on to-end Measurements

The CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL A Guide to Jane, an Open Source Hierarchical Translation Toolkit

Big Data and Scripting. (lecture, computer science, bachelor/master/phd)

Integrating Online Learning and Cross Cultural Communication in the Project Management Course

Campaign Design Strategies. Matt Sawkins, Product Manager

Portuguese-English Statistical Machine Translation using Tree Transducers

Final Review Ch. 1 #2

Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Exemplar for Internal Achievement Standard. German Level 1

Modeling coherence in ESOL learner texts

Language and Computation

Activities. but I will require that groups present research papers

A Systematic Comparison of Various Statistical Alignment Models

Chapter 7. Language models. Statistical Machine Translation

Big Data and Scripting

Maskinöversättning F2 Översättningssvårigheter + Översättningsstrategier

Language Model of Parsing and Decoding

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

The Principle of Translation Management Systems

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Outline of today s lecture

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Annotation Guidelines for Dutch-English Word Alignment

Transcription:

Phrase-Based MT Machine Translation Lecture 7 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn

Translational Equivalence Er hat die Prüfung bestanden, jedoch nur knapp He insisted on the test, but just barely. He passed the test, but just barely. How do lexical translation models deal with contextual information?

Translational Equivalence Er hat die Prüfung bestanden, jedoch nur knapp He insisted on the test, but just barely. He passed the test, but just barely. F E prob bestanden insisted 0.06 were 0.06 existed 0.04 was 0.04 been 0.04 passed 0.03 consist 0.01

Translational Equivalence Er hat die Prüfung bestanden, jedoch nur knapp He insisted on the test, but just barely. He passed the test, but just barely. Lexical Translation What is wrong with this? How can we improve this?

Translation model What are the atomic units? Lexical translation: words Phrase-based translation: phrases Standard model used by Google, Microsoft... Benefits many-to-many translation use of local context in translation Downsides Where do phrases comes from?

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = X Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen fliege ich nach Baltimore zur Konferenz e = Tomorrow I will fly e,f a to the conference in Baltimore

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = a = e = X Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen fliege ich nach Baltimore zur Konferenz e,f a Tomorrow I will fly to the conference in Baltimore

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = a = e = X Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen fliege ich nach Baltimore zur Konferenz e,f a Tomorrow I will fly to the conference in Baltimore p(morgen Tomorrow)

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = a = e = fliege X Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen ich nach Baltimore zur Konferenz e,f a Tomorrow I will fly to the conference in Baltimore p(morgen Tomorrow) x p(fliege will fly)

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = a = e = fliege ich X Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen nach Baltimore zur Konferenz e,f a Tomorrow I will fly to the conference in Baltimore p(morgen Tomorrow) x p(fliege will fly) x p(ich I)

` With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f a f = a = e = fliege ich X nach Baltimore Y We can p(f then e) = marginalize p(a) to p(f get p(f e): a A Morgen zur Konferenz e,f a Tomorrow I will fly to the conference in Baltimore p(morgen Tomorrow) x p(fliege will fly) x p(ich I) x...

Translation model With a latent variable, we introduce a decomposition into phrases which translate independently: p(f, a e) =p(a) Y p(f e) e,f Marginalize to get p(f e): p(f e) = X a A p(a) a Y p(f e) e,f a

Phrases Contiguous strings of words Phrases are not necessarily syntactic constituents Usually have maximum limits Phrases subsume words (individual words are phrases of length 1)

Linguistic Phrases Model is not limited to linguistic phrases (NPs, VPs, PPs, CPs...) Non-constituent phrases are useful es gibt there is there are Is a good phrase more likely to be [P NP] or [governor P] Why? How would you figure this out?

Phrase Tables f e p(f e) the issue 0.41 das Thema the point 0.72 the subject 0.47 the thema 0.99 es gibt there is 0.96 there are 0.72 morgen tomorrow 0.9 will I fly 0.63 fliege ich will fly 0.17 I will fly 0.13

p(a) Two responsibilities Divide the source sentence into phrases Standard approach: uniform distribution over all possible segmentations How many segmentations are there? Reorder the phrases Standard approach: Markov model on phrases (parameterized with log-linear model)

Reordering Model

Learning Phrases Latent segmentation variable Latent phrasal inventory Parallel data EM? Computational problem: summing over all segmentations and alignments is #P-complete Modeling problem: MLE has a degenerate solution.

Learning Phrases Three stages word alignment extraction of phrases estimation of phrase probabilities

Consistent Phrases

Phrase Extraction watashi I open the box wa hako wo akemasu

Phrase Extraction watashi I open the box wa hako wo akemasu akemasu / open

Phrase Extraction watashi I open the box wa hako wo akemasu watashi wa / I

Phrase Extraction watashi I open the box wa hako wo akemasu watashi / I

Phrase Extraction watashi I open the box wa hako wo akemasu watashi / I

Phrase Extraction watashi I open the box wa hako wo akemasu hako wo / box

Phrase Extraction watashi I open the box wa hako wo akemasu hako wo / the box

Phrase Extraction watashi I open the box wa hako wo akemasu hako wo / open the box

Phrase Extraction watashi I open the box wa hako wo akemasu hako wo / open the box

Phrase Extraction watashi I open the box wa hako wo akemasu hako wo akemasu / open the box

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause Chapter 6: Decoding 2

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase in input, translate Chapter 6: Decoding 3

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not Pick phrase in input, translate it is allowed to pick words out of sequence reordering phrases may have multiple words: many-to-many translation Chapter 6: Decoding 4

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go Pick phrase in input, translate Chapter 6: Decoding 5

Translation Process Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Pick phrase in input, translate Chapter 6: Decoding 6

Computing Translation Probability Probabilistic model for phrase-based translation: e best = argmax e IY i=1 ( f i ē i ) d(start i end i 1 1) p lm (e) Score is computed incrementally for each partial hypothesis Components Phrase translation Picking phrase f i to be translated as a phrase ē i! look up score ( f i ē i ) from phrase translation table Reordering Previous phrase ended in end i 1,currentphrasestartsatstart i! compute d(start i end i 1 1) Language model For n-gram model, need to keep track of last n 1 words! compute score p lm (w i w i (n 1),...,w i 1 ) for added words w i Chapter 6: Decoding 7

Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course, not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home Many translation options to choose from in Europarl phrase table: 2727 matching phrase pairs for this sentence by pruning to the top 20 per phrase, 202 translation options remain Chapter 6: Decoding 8

Translation Options er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home The machine translation decoder does not know the right answer picking the right translation options arranging them in the right order! Search problem solved by heuristic beam search Chapter 6: Decoding 9

Decoding: Precompute Translation Options er geht ja nicht nach hause consult phrase translation table for all input phrases Chapter 6: Decoding 10

Decoding: Start with Initial Hypothesis er geht ja nicht nach hause initial hypothesis: no input words covered, no output produced Chapter 6: Decoding 11

Decoding: Hypothesis Expansion er geht ja nicht nach hause are pick any translation option, create new hypothesis Chapter 6: Decoding 12

Decoding: Hypothesis Expansion er geht ja nicht nach hause he are it create hypotheses for all other translation options Chapter 6: Decoding 13

Decoding: Hypothesis Expansion er geht ja nicht nach hause yes he goes home are does not go home it to also create hypotheses from created partial hypothesis Chapter 6: Decoding 14

Decoding: Find Best Path er geht ja nicht nach hause yes he goes home are does not go home it to backtrack from highest scoring complete hypothesis Chapter 6: Decoding 15

Read Chapter 5 and 6 from the textbook Reading

Announcements HW2 will be released soon HW2 due Thursday Feb 19th at 11:59pm