Deciphering Foreign Language

Similar documents
Chapter 5. Phrase-based models. Statistical Machine Translation

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Machine Translation. Agenda

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Statistical Machine Translation

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Statistical NLP Spring Machine Translation: Examples

Convergence of Translation Memory and Statistical Machine Translation

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Computer Aided Translation

Chapter 6. Decoding. Statistical Machine Translation

Empirical Machine Translation and its Evaluation

A Joint Sequence Translation Model with Integrated Reordering

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Collecting Polish German Parallel Corpora in the Internet

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

CURRICULUM MAPPING. Content/Essential Questions for all Units

Exemplar for Internal Achievement Standard. Spanish Level 1

Chapter 10 Physical Development from One to Three

THUTR: A Translation Retrieval System

Customizing an English-Korean Machine Translation System for Patent Translation *

The history of machine translation in a nutshell

Overview of MT techniques. Malek Boualem (FT)

Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Telling and asking for the time.

Building a Web-based parallel corpus and filtering out machinetranslated

Statistical Machine Translation: IBM Models 1 and 2

Language and Computation

DIPLOMADO DE JAVA - OCA

SUMMER WORK AP SPANISH LANGUAGE & CULTURE Bienvenidos a la clase de Español AP!

Hybrid Machine Translation Guided by a Rule Based System

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

Machine Translation and the Translator

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Copyright TeachMe.com 4ea67 1

Teacher: Course Name: Spanish I Year. World Language Department Saugus High School Saugus Public Schools

Unit 2. This is my job

Be courteous and respectful. - treat others the way you would like to be treated.

TS3: an Improved Version of the Bilingual Concordancer TransSearch

Masconomet Regional High School Curriculum Guide

Getting Off to a Good Start: Best Practices for Terminology

How To Encrypt With A 64 Bit Block Cipher

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

FOR TEACHERS ONLY The University of the State of New York

Hybrid Strategies. for better products and shorter time-to-market

tance alignment and time information to create confusion networks 1 from the output of different ASR systems for the same

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

Page 1. Session Overview: Cryptography

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

New words to remember

Comprendium Translator System Overview

S P A N I S H G R A M M A R T I P S

Statistical Machine Translation

FAMILY INDEPENDENCE ADMINISTRATION Seth W. Diamond, Executive Deputy Commissioner

Introduction. Philipp Koehn. 28 January 2016

First Year Spanish I SPAN 111 Summer 2015 MTWR

Translation Solution for

Plan for: Preliminary Lessons Spanish Señora Franco

Composición 2: Narración Tema: Una experiencia inolvidable

Multi-Engine Machine Translation by Recursive Sentence Decomposition

Ask your child what he or she is learning to say in Spanish at school. Encourage your child to act as if he or she is your teacher.

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models

Response Area 3 - Community Meeting

Cryptography: Motivation. Data Structures and Algorithms Cryptography. Secret Writing Methods. Many areas have sensitive information, e.g.

Control of a variety of structures and idioms; occasional errors may occur, but

PROMT Technologies for Translation and Big Data

Spanish 8695/S Paper 3 Speaking Test Teacher s Booklet Time allowed Instructions one not not Information exactly as they are printed not 8695/S

Translating the Penn Treebank with an Interactive-Predictive MT System

An End-to-End Discriminative Approach to Machine Translation

Digital Experience Day. Panorama actual. Madrid, Junio Oded Lida Greiss Director de la región Latina, Eyeblaster

AP SPANISH LANGUAGE 2011 PRESENTATIONAL WRITING SCORING GUIDELINES

0678 FOREIGN LANGUAGE SPANISH (US)

EVALUATING A FOREIGN LANGUAGE IN PRIMARY EDUCATION

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Chapter 23. Database Security. Security Issues. Database Security

IRIS - English-Irish Translation System

Transcription:

Deciphering Foreign Language NLP 1! Sujith Ravi and Kevin Knight sravi@usc.edu, knight@isi.edu Information Sciences Institute University of Southern California! 2

Statistical Machine Translation (MT) Current MT systems Bilingual text (Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates (Spanish) sus grupos están en Europa (English) its groups are in Europe TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) its groups are in Europe TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe Swahili/German TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe Swahili/German BOTTLENECK TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Can we get rid of parallel data? Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe Swahili/German BOTTLENECK TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 2

Statistical Machine Translation (MT) Current MT systems Can we get rid of parallel data? Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe Swahili/German BOTTLENECK TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 2

Statistical Machine Translation (MT) Current MT systems Can we get rid of parallel data? Bilingual text Spanish/English (Spanish) Garcia y asociados (English) Garcia and associates Japanese/German (Spanish) sus grupos están en Europa (English) Malayalam/English its groups are in Europe Swahili/German BOTTLENECK TRAIN Translation tables associates/ asociados 0.8 groups/ grupos 0.9 PLENTY PLENTY Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 2

Machine Translation without parallel data Getting Rid of Parallel Data MT system trained on non-parallel data useful for rare language-pairs (limited/no parallel data) 3

Machine Translation without parallel data Getting Rid of Parallel Data MT system trained on non-parallel data useful for rare language-pairs (limited/no parallel data) Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 3

Machine Translation without parallel data Getting Rid of Parallel Data MT system trained on non-parallel data useful for rare language-pairs (limited/no parallel data) Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 Goal not to beat existing MT systems, instead Can we build a reasonably good MT system from scratch without any parallel data? monolingual resources available in plenty 3

Related Work Machine Translation without parallel data Extracting bilingual lexical connections from comparable corpora exploit word context frequencies (Fung, 1995; Rapp, 1995; Koehn & Knight, 2001) Canonical Correlation Analysis (CCA) method (Haghighi & Klein, 2008) Mining parallel sentence pairs for MT training using comparable corpora (Munteanu et al., 2004) need dictionary, some initial parallel data 4

NLP Our Contributions New MT system built from scratch without parallel data novel decipherment approach for translation novel methods for training translation models from non-parallel text Bayesian training for IBM 3 translation model Novel methods to deal with large-scale vocabularies inherent in MT problems Empirical studies for MT decipherment 5

Rest of this Talk Introduction Related Work New Idea for Language Translation Step 1 Word Substitution Step 2 Foreign Language as a Cipher Conclusion 6

New Cracking the MT Code When I look at an article in Spanish, I say to myself, this is really English, but it has been encoded in some strange symbols. Now I will proceed to decode Warren Weaver (1947) (Spanish) Ciphertext este es un sistema de cifrado complejo 7

New Cracking the MT Code When I look at an article in Spanish, I say to myself, this is really English, but it has been encoded in some strange symbols. Now I will proceed to decode Warren Weaver (1947) (Spanish) Ciphertext este es un sistema de cifrado complejo (English) Plaintext this is a complex cipher 7

New MT Decipherment without Parallel Data f Spanish corpus El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha ordenado la documentación por diferentes categorías atendiendo a los hechos más notables. Desde el tipo de suceso (evento criminal, fuego amigo,

New MT Decipherment without Parallel Data P(e) e English? P(f e) f Spanish corpus English Language Model English-to-Spanish Translation Model key El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha ordenado la documentación por diferentes categorías atendiendo a los hechos más notables. Desde el tipo de suceso (evento criminal, fuego amigo,

New MT Decipherment without Parallel Data English corpus (CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,. Language Model Training P(e) e English? P(f e) f Spanish corpus English Language Model English-to-Spanish Translation Model key El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha ordenado la documentación por diferentes categorías atendiendo a los hechos más notables. Desde el tipo de suceso (evento criminal, fuego amigo,

New MT Decipherment without Parallel Data English corpus (CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,. Language Model Training For each f alignments = hidden e translation = hidden P(e) e English? P(f e) f Spanish corpus English Language Model English-to-Spanish Translation Model key El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha ordenado la documentación por diferentes categorías atendiendo a los hechos más notables. Desde el tipo de suceso (evento criminal, fuego amigo,

New MT Decipherment without Parallel Data English corpus (CNN) WikiLeaks website publishes classified military documents from Iraq. The whistle-blower website WikiLeaks published nearly 400,000 classified military documents from the Iraq war on Friday, calling it the largest classified military leak in history,. Language Model Training Train parameters θ to maximize probability of observed foreign text f argmax θ Pθ (f ) argmax θ e Pθ (e, f) argmax θ e P(e). Pθ (f e) TRAINING For each f alignments = hidden e translation = hidden P(e) e English? P(f e) f Spanish corpus English Language Model English-to-Spanish Translation Model key El portal web permite la búsqueda por todo tipo de métodos. Por un lado, Wikileaks ha ordenado la documentación por diferentes categorías atendiendo a los hechos más notables. Desde el tipo de suceso (evento criminal, fuego amigo,

Characteristics of Decipherment Key for MT MT Determinism in Key? Linguistic unit of substitution Transposition (re-ordering) Insertion Deletion Scale (vocabulary & data sizes) 9

Characteristics of Decipherment Key for MT MT Hard problem! Determinism in Key? Linguistic unit of substitution Transposition (re-ordering) many-to-many Word / Phrase Insertion Deletion Scale (vocabulary & data sizes) Large (100-1M word types) 10

Characteristics of Decipherment Key Tackle a simpler problem first! Word Substitution MT Hard problem! Determinism in Key? Linguistic unit of substitution Transposition (re-ordering) Insertion 1-to-1 Word many-to-many Word / Phrase Deletion Scale (vocabulary & data sizes) Large (100-1M word types) 11 Large (100-1M word types)

Rest of this Talk Introduction Related Work New Idea for Language Translation Word Substitution Foreign Language as a Cipher Conclusion 12

Word Substitution Word Substitution (on the road to MT)!"#$%&'()%$"*)%)#+%*,%-,./*)"0% 95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 English words masked by cipher symbols 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 1*2(%3,)34(56*)(&%789%$#..*,.%(;<(4$%$"(5(%#5(%"3,&5(&)%'=% $"'3)#,&)%'=%>$#.)?%4(5%<*4"(5%$'2(,%@$#.%A%-,./*)"%B'5&C% 13

Word Substitution Word Substitution (on the road to MT)!"#$%&'()%$"*)%)#+%*,%-,./*)"0% 95 90 76 31 95 20 43 11 80 60 16!"#$% 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 Learn substitution mappings between English words and cipher symbols 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16!"#$%&'( 1*2(%3,)34(56*)(&%789%$#..*,.%(;<(4$%$"(5(%#5(%"3,&5(&)%'=% $"'3)#,&)%'=%>$#.)?%4(5%<*4"(5%$'2(,%@$#.%A%-,./*)"%B'5&C% 14

Word Substitution Word Substitution Decipherment!"#$%&'%(') *) 15 95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16

Word Substitution Word Substitution Decipherment!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'%(') *) /0&12$ 95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16 15

Word Substitution Word Substitution Decipherment!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'%(') *) English Language Model 15 /0&12$ Substitution Table 95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16

Word Substitution Word Substitution Decipherment English corpus LM Training!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'%(') *) English Language Model 15 /0&12$ Substitution Table 95 90 76 31 95 20 43 11 80 60 16 95 65 31 50 42 16 65 31 50 58 42 16 19 95 16 92 73 16 65 67 57 31 26 65 38 70 52 57 30 33 26 16 47 24 56 21 16

Word Substitution Word Substitution Decipherment!"&'$ &$ ()*+,-.$?!"#$%&'%(')!"#$%$&'$ /0&12$ *) 16

Word Substitution Word Substitution Decipherment!"&'$ &$ ()*+,-.$?!"#$%&'%(')!"#$%$&'$ *) /0&12$ key contains millions of parameters!!! 16

Word Substitution Word Substitution Decipherment!"&'$ &$ ()*+,-.$?!"#$%&'%(')!"#$%$&'$ /0&12$ *) key contains millions of parameters!!! New New 1. EM Decipherment 2. Bayesian Decipherment - EM using new iterative training procedure - Bayesian inference (efficient, parallelized sampling) 16

Word Substitution New Word Substitution (1) EM!!"#$%&'(#)*%+#!,-*.)/###! &))0#(1#%(12)#3#4-0*()#567#-*2*7)()2%#! 8$7)/###! 9$)#;<,#(*==$&=>#?4(#56#-1%%$?@)#A(*=%B#-)2#.$-C)2#(1)&#!,1@4D1&/# Iterative EM! E4$@0#FGF#H#FGF#.C*&&)@#IJ$(C#KLM#J120N#!!"#*%%$=&%#KLM#(1#%17)#.$-C)2#(1)&%#!!@$7$&*()#-*2*7)()2%#!!H-*&0#(1#5GF#H#5GF>#)(.O# Decoding Viterbi decoding using trained channel (details in paper) 17

Word Substitution New Word Substitution (2) Bayesian Several advantages over EM efficient inference, even with higher-order LMs incremental scoring of derivations during sampling novel sample-selection strategy permits fast training no memory bottleneck sparse priors help learn skewed distributions 18

Word Substitution Word Substitution (2) Bayesian (continued) Same generative story as EM, replace models with Chinese Restaurant Process (CRP) formulations Base distributions (P0) source = English LM probabilities, channel = uniform Priors source (α) = 10 4, channel (β) = 0.01 Inference via point-wise Gibbs sampling Smart sample-choice selection 19

Word Substitution Smart Sample-Choice Selection for Bayesian Decipherment cipher 22 94 43 04 98 current three eight living here? resample three the living here? resample three a living here?! resample three and living here?! resample three boys living here? resample three gun living here? resample three brick living here? resample three ran living here?!"#$%&'()*+,)-("#$!$%&'()*+,-.!/0,'!1,*0%$!/#2%-3! 40#5'(-6'!#$!),++,#-'!#"!1,*0%$!/#2%-'!*%$!%*#107! 85$$%-/!'#+59#-!!'()*+%!+./'(0+1(233(*+(,-/%;/!10#,1%'! <=,-%!1#)*5/(9#-!!>,?%-!@!(-6!AB!C0(/!($%!/#*&DEE!F!$(-2%6!GH!IJ@!F!AKL!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!M"!@!(-6!A!-%?%$!1#&#115$$%6B!/0%-!*,12!DEE!$(-6#)!C#$6'7! 20

Word Substitution Word Substitution (2) Bayesian (continued) Same generative story as EM, replace models with Chinese Restaurant Process (CRP) formulations Base distributions (P0) source = English LM probabilities, channel = uniform Priors source (α) = 10 4, channel (β) = 0.01 Inference via point-wise Gibbs sampling Smart sample-choice selection Parallelized sampling using Map Reduce (3 to 5-fold faster) Decoding Extract trained channel from final sample, Viterbi-decode (details in paper) 21

Word Substitution Word Substitution Results 22

Word Substitution Sample Decipherments Cipher Original English Deciphered 23

Rest of this Talk Introduction Related Work New Idea for Language Translation Word Substitution Foreign Language as a Cipher Conclusion 24

Machine Translation without parallel data Foreign Language as a Cipher!"#$%&'()$*)( +( Spanish corpus 25

Machine Translation without parallel data Foreign Language as a Cipher!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'()$*)( +( /0&12$ Spanish corpus 25

Machine Translation without parallel data Foreign Language as a Cipher!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'()$*)( +( /0&12$ English Language Model English-to-Foreign Translation Model Spanish corpus 25

Machine Translation without parallel data Foreign Language as a Cipher English corpus LM Training!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'()$*)( +( /0&12$ English Language Model English-to-Foreign Translation Model Spanish corpus 25

Machine Translation without parallel data Foreign Language as a Cipher Machine Translation without parallel data = Word Substitution English corpus LM Training + transposition + insertion + deletion +!"&'$ &$ ()*+,-.$?!"#$%$&'$!"#$%&'()$*)( +( /0&12$ English Language Model English-to-Foreign Translation Model Spanish corpus 25

Machine Translation without parallel data Deciphering Foreign Language!"&'$ &$ ()*+,-.$?!"#$%$&'$ /0&12$ P(f e) can be any translation model used in MT but without parallel data, training is intractable!"#$%&'()$*)( +( (e.g., IBM Model 3) 26

Machine Translation without parallel data Deciphering Foreign Language!"&'$ &$ ()*+,-.$?!"#$%$&'$ /0&12$ P(f e) can be any translation model used in MT but without parallel data, training is intractable!"#$%&'()$*)( +( (e.g., IBM Model 3) New New 1. EM Decipherment 2. Bayesian Decipherment - Simple generative story - model can be trained efficiently using EM - IBM Model 3 generative story - Bayesian inference (efficient, parallelized sampling) 26

New MT Decipherment (1) EM Machine Translation without parallel data Simple generative story (avoid large-sized fertility, distortion tables) word substitutions insertions deletions local re-ordering Model can be trained efficiently using EM Use prior linguistic knowledge for decipherment morphology, linguistic constraints (e.g., 8 in English maps to 8 in Spanish) Whole-segment language models instead of word n-gram LMs e.g., ARE YOU TALKING ABOUT ME? THANK YOU TALKING ABOUT? 27

New Machine Translation without parallel data MT Decipherment (2) Bayesian Generative story IBM Model 3 (popular) translation (word substitution) fertility distortion (transposition) 28

New Machine Translation without parallel data MT Decipherment (2) Bayesian Generative story IBM Model 3 (popular) translation (word substitution) fertility distortion (transposition) Complex model, makes inference very hard New translation model replace IBM Model 3 components with CRP processes 28

New Machine Translation without parallel data MT Decipherment (2) Bayesian Generative story IBM Model 3 (popular) translation (word substitution) fertility distortion (transposition) Complex model, makes inference very hard New translation model replace IBM Model 3 components with CRP processes 28 CRP cache model

Machine Translation without parallel data MT Decipherment (2) Bayesian (continued) Bayesian inference for estimating translation model efficient, scalable inference using strategies described earlier Sampling IBM Model 3 point-wise Gibbs sampling for each foreign string f, jointly sample alignments, e translations sampling operators = translate 1 word, swap alignments, (similar to German et al., 2001) blocked sampling sample single derivation for repeating sentences Choose the final sample as MT decipherment output 29

Machine Translation without parallel data Time Expressions English corpus 10 MONTHS LATER 10 MORE YEARS 24 MINUTES 28 CONSECUTIVE QUARTERS A WEEK EARLIER ABOUT A DECADE AGO ABOUT A MONTH AFTER AUGUST 6, 1789 CENTURIES AGO DEC. 11, 1989 TWO DAYS LATER TWO DECADES LATER TWO FULL DAYS YEARS 30 Spanish corpus 10 días consecutivos de cotización 10 semanas consecutivas 100 años después 17 de abril 1986 años cuarto puesto enero alrededor. 28 mil años antes mil años otros 12 meses más o menos una de tres horas uno de tres años un jueves por la noche Un día hace poco

Machine Translation without parallel data Time Expressions English corpus 10 MONTHS LATER 10 MORE YEARS 24 MINUTES 28 CONSECUTIVE QUARTERS A WEEK EARLIER ABOUT A DECADE AGO ABOUT A MONTH AFTER AUGUST 6, 1789 Results CENTURIES AGO DEC. 11, 1989 Baseline without parallel data TWO DAYS Decipherment LATER without parallel data TWO DECADES LATER TWO FULL DAYS YEARS 30 BLEU score 100 75 50 25 0 Spanish corpus 10 días consecutivos de cotización 10 semanas consecutivas 100 años después 17 de abril 1986 años cuarto puesto enero alrededor. 28 Higher is better mil años antes mil años MOSES with parallel data 88 otros 12 meses más o menos una de tres 48.7horas uno de tres años un 18.2 jueves por la noche Un día hace poco

Machine Translation without parallel data English corpus Movie Subtitles ALL RIGHT, LET' S GO. ARE YOU ALL RIGHT? ARE YOU CRAZY? HEY, DO YOU WANT TO COME OUT AND PLAY THE GAME? WHAT ARE YOU DOING HERE? YEAH! YOU KNOW WHAT I MEAN? OPUS Spanish/English corpus [ Tiedemann, 2009 ] 31 Spanish corpus abran la puerta. bien hecho. por aquí! a qué te refieres? cómo podré verlos a través de mis lágrimas? oye, quieres salir y jugar el juego? un segundo. vamonos.

Machine Translation without parallel data English corpus Movie Subtitles ALL RIGHT, LET' S GO. ARE YOU ALL RIGHT? ARE YOU CRAZY? HEY, DO YOU WANT TO COME OUT AND PLAY THE GAME? WHAT ARE YOU DOING HERE? YEAH! YOU KNOW WHAT I MEAN? Results Decipherment without parallel data OPUS Spanish/English corpus [ Tiedemann, 2009 ] 31 BLEU score Spanish corpus abran la puerta Higher. is better bien hecho. 100 por aquí! a 75 qué te refieres? cómo podré MOSES verlos with a través parallel de mis data lágrimas 63.6? oye, quieres salir y jugar el juego? 50 un segundo. vamonos 25 19.3. 0

MT Accuracy vs. Data Size Machine Translation without parallel data Phrase-based MT, parallel data MT quality on test set IBM Model 3 - distortion, parallel data EM Decipherment, no parallel data 32

MT Accuracy vs. Data Size Machine Translation without parallel data Phrase-based MT, parallel data MT quality on test set IBM Model 3 - distortion, parallel data EM Decipherment, no parallel data 32

Conclusion Language translation without parallel data very challenging task, but shown to be possible! (using decipherment approach) initial results promising can easily extend to new language pairs, domains Future Work Scalable decipherment methods for full-scale MT Better unsupervised algorithms for decipherment Leverage existing bilingual resources (e.g., dictionaries, etc.) during decipherment Applications for domain adaptation 33

What else can Decipherment Do? Language Translation This talk Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 34

What else can Decipherment Do? Language Translation This talk Monolingual corpora Spanish text English text TRAIN Translation tables associates/ asociados 0.8 Cryptanalysis CATCH A SERIAL KILLER Afternoon talk (2pm, Machine Learning Session 2-B) 34

NLP THANK YOU! 35