Machine Translation. Agenda



Similar documents
Chapter 5. Phrase-based models. Statistical Machine Translation

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

Statistical NLP Spring Machine Translation: Examples

Language and Computation

Statistical Machine Translation: IBM Models 1 and 2

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

A Joint Sequence Translation Model with Integrated Reordering

Factored Language Models for Statistical Machine Translation

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Convergence of Translation Memory and Statistical Machine Translation

A Systematic Comparison of Various Statistical Alignment Models

THUTR: A Translation Retrieval System

Statistical Machine Translation

Chapter 6. Decoding. Statistical Machine Translation

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Turkish Radiology Dictation System

Collecting Polish German Parallel Corpora in the Internet

TS3: an Improved Version of the Bilingual Concordancer TransSearch

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION

Machine Translation and the Translator

Big Data and Scripting. (lecture, computer science, bachelor/master/phd)

Deciphering Foreign Language

Log-Linear Models a.k.a. Logistic Regression, Maximum Entropy Models

Overview of MT techniques. Malek Boualem (FT)

Building a Web-based parallel corpus and filtering out machinetranslated

Arithmetic Coding: Introduction

Hybrid Machine Translation Guided by a Rule Based System

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

What is Artificial Intelligence?

Computer Aided Translation

English to Arabic Transliteration for Information Retrieval: A Statistical Approach

ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Introduction. Philipp Koehn. 28 January 2016

Factored bilingual n-gram language models for statistical machine translation

Glossary of translation tool types

Tagging with Hidden Markov Models

Information Leakage in Encrypted Network Traffic

Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages

Tuning Methods in Statistical Machine Translation

Word Completion and Prediction in Hebrew

Learning Translations of Named-Entity Phrases from Parallel Corpora

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

MACHINE TRANSLATION. I.1 Introduction to Statistical Machine Translation

Micro blogs Oriented Word Segmentation System

Automatic Generation of Bid Phrases for Online Advertising

Domain-specific terminology extraction for Machine Translation. Mihael Arcan

Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora

Linear Programming. March 14, 2014

Statistical Machine Translation

Pre-processing of Bilingual Corpora for Mandarin-English EBMT

A Joint Sequence Translation Model with Integrated Reordering

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

The CMU Syntax-Augmented Machine Translation System: SAMT on Hadoop with N-best alignments

Course: Model, Learning, and Inference: Lecture 5

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

1 Solving LPs: The Simplex Algorithm of George Dantzig

IRIS - English-Irish Translation System

Language Modeling. Chapter Introduction

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Statistical Machine Translation of French and German into English Using IBM Model 2 Greedy Decoding

Decision Theory Rational prospecting

Transcription:

Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm Language models MT evaluation Further topics for exploration 1 Machine Translation Mapping from a source language string to a target language string, e.g., Spanish source: Perros pequeños tienen miedo de mi English target: The right way to do this Map the source language to some semantic interlingua, e.g., fear(dog([plural],[small]),sister([my,singular],[young,clumsy])) Generate the target string from the interlingual representation This isn t feasible in current state of technology 2 Current best approaches to MT Statistical models are the current best practice e.g., Google translation is data driven Basic approach taken from statistical speech recognition Let source string be f and target language be e argmax e P(f e) is the translation model P(f e) P(e) P(e f) =argmax e P(f) P(f e) P(e) =argmax e (akin to acoustic model in statistical speech recognition) P(e) is the language model Translation model Given a pair of strings <f, e>, assigns P(f e) If f looks like a good translation of e, then P(f e) will be high If f doesn t look like a good translation of e, then P(f e) will be low Where do these pairs of strings <f, e> come from? Paying people to translate from multiple languages is expensive Would rather get free resources, even if imperfect (or noisy ) data Such data is produced independently: parallel corpora 3 4 Parallel corpora Examples: The Hansards corpus of Canadian Parliament transcripts, by law in both French and English Similar resources for EU official proceedings and documents Software manuals, web pages, other available data Document-aligned Must be sentence- and word-aligned to derive models Learning alignment models If we only have document-aligned parallel corpora, how do we get to the sentence alignment? Simple heuristics based on length of sentences. Once we have sentence-aligned parallel corpora, how do we get to the word alignment? One answer: align words that often appear together 5 6

Example parallel corpus Example sentence alignment. Because she is so clumsy, the dogs think she will fall on them. Big dogs do not fear her, just the small ones. They do not fear my little sister because she fears them. Perros pequeños tienen miedo de mi. Porque es tan, los perros creen que ella se caerá sobre ellos. Perros grandes no tienen miedo de ella, solo los pequeños. No tienen miedo de mi porque ella tiene miedo de ellos. Perros pequeños tienen miedo de mi Because she is so clumsy, the dogs Porque es tan, los perros creen think she will fall on them que ella se caerá sobre ellos Big dogs do not fear her, just the small Perros grandes no tienen miedo de ones ella, solo los pequeños They do not fear my little sister because she fears them porque ella tiene miedo de No tienen miedo de mi ellos 7 8 9 10 Source string: f = f 1...f f Notation Target string: e = e 1...e e Alignment under the assumption of at most one target word per source word: a = a 1...a f, where 0 a i e a i = j if f i aligns with e j a i =0if f i is unaligned with anything in e Thus for our example: f = Perros pequeños tienen miedo de mi e = a = 21330475 11 Probabilistic modeling Given a target string, assign joint probabilities to source strings and alignments: P(f, a e) The probability of the source string is the sum over all alignments P(f e) = a P(f, a e) The best alignment is the one that maximizes the probability â =argmaxp(f, a e) a Decompose full joint into product of conditionals: F P(f, a e) =P(F e) P(f i,a i ef 1 a 1...f i 1 a i 1 ) where F = f i=1 12

Heuristic alignments Alignment algorithms Calculate word similarity in some way, e.g., Dice coefficient dice(i, j) = 2c(e i,f j ) c(e i )c(f j ) where c(e i,f j ) is the count of parallel sentences containing e i on the source side and f j on the target side Build matrix of similarities Align highly-similar words Various strategies to align: Choose a j = argmax i {dice(i, j)} Greedily choose best link (globally), then remove row and column from matrix (competitive linking algorithm) Heuristic Dice Competitive linking Statistical IBM models 1-5 [Brown et al. 93] Expectation-Maximization algorithm Another pipeline HMM model [Deng & Byrne 05] GIZA++ software [code.google.com/p/giza-pp/] 13 14 Limitations of word-based translation One-to-many and many-to-many alignment Some approaches make simplifying assumptions regarding word fertility, i.e., number of aligned words Crossing alignments Relatively small permutations e.g., post-nominal modifiers (perros pequeños small dogs) Relatively large permutations e.g., argument ordering ( in pain young Skywalker is ) 15 16 Phrase-based translation Translate sequences of source-language words into (possibly) sequences of target-language words Advantages of phrase-based translation Many-to-many translation Allows for more context in translation Phrase table Extracted by growing word alignments Limited by phrase length Ambiguity in translation look-up 17 18

19 20 Language model Goal: to detect good English Standard technique: n-gram model Calculate the probability of seeing a sequence of n words Probability of a sentence is product of n-gram probabilities Bi-gram model example: P( ) = P(Small) P(dogs Small) P(fear dogs) P(my fear) P(clumsy my) P(little clumsy) P(sister little) Arbitrary values of n Language modeling, v0.0: n=2 21 22 MT evaluation Ideal: human evaluation Adequacy: does the translation correctly capture the information of the source sentence? Fluency: is the translation a good sentence of the target language? But: slow and expensive Automatic evaluation Intuition: comparing two candidate translations T 1 and T 2 To the extent that T 1 overlaps more with a reference (human-produced) BLEU Measure overlap by counting n-grams in candidate that match the reference translation More matches better translation Precision metric Brevity penalty log BLEU = min(1 r N c, 0) + w n log(p n ) n=1 translation R, it is better than T 2 How to measure overlap? Differences in length of translation? Multiple reference translations? 23 24

Further topics of exploration Translation model More, better, different data Different word-alignment algorithms Length of extracted phrases Language model More, better, different data Size of n-grams Add more knowledge to the process Numbers Dates Named entities 25