Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Similar documents
A Joint Sequence Translation Model with Integrated Reordering

Search Engines Chapter 2 Architecture Felix Naumann

Diese Liste wird präsentiert von. Netheweb.de

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]

Machine Translation and the Translator

HEALTH, INSURANCE, & YOU: New Options For Health Insurance. Presented by SEIU Local 521

Convergence of Translation Memory and Statistical Machine Translation

FOR TEACHERS ONLY The University of the State of New York

Extracting translation relations for humanreadable dictionaries from bilingual text

The KIT Translation system for IWSLT 2010

Semantics in Statistical Machine Translation

Statistical Machine Translation

Exemplar for Internal Assessment Resource German Level 1. Resource title: Planning a School Exchange

MEASURING SERVICE QUALITY IN SOFTWARE OUTSOURCING AT QUANTIC Co., Ltd- A DIMENSION-BY-DIMENSION ANALYSIS

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks

Như ng kiê n thư c câ n biê t vê giâ y phe p cư tru điê n tư (eat)

Sequence to Sequence Learning with Neural Networks

Chapter 5. Phrase-based models. Statistical Machine Translation

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Language Resources, Language Technology, Text Mining, the Seman8c Web: How interoperability of machines can help humans in the mul8lingual web

Computer Aided Translation

arxiv: v2 [cs.cl] 23 Jun 2015

LEARNING AGREEMENT FOR STUDIES

I-Q SCHACHT & KOLLEGEN QUALITÄTSKONSTRUKTION GMBH ISO 26262:2011. Liste der Work Products aus der Norm

AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES

Collaborative Innovation Driving Value to the Enterprise

Does it really CHANGE something?

LIUM s Statistical Machine Translation System for IWSLT 2010

Exemplar for Internal Achievement Standard. German Level 1

Introduction. Philipp Koehn. 28 January 2016

Coffee Break German. Lesson 09. Study Notes. Coffee Break German: Lesson 09 - Notes page 1 of 17

Collaborative Machine Translation Service for Scientific texts

Product Availability List Graphic Arts Film Products. September 2007 (version 3.0)

Hybrid Strategies. for better products and shorter time-to-market

Processing Dialogue-Based Data in the UIMA Framework. Milan Gnjatović, Manuela Kunze, Dietmar Rösner University of Magdeburg

Statistical Machine Translation

IRIS - English-Irish Translation System

Mit einem Auge auf den mathema/schen Horizont: Was der Lehrer braucht für die Zukun= seiner Schüler

Certificate IV in Business Business Administration

Linux & Docker auf Azure

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation Project no.

How To Write A Multilingual Web Conference

QUALITY. Pracovné odevy. Work clothes. Arbeitskleidung Les vêtements de travail. Munka ruhák

Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)

Factored Translation Models

Machine Learning for natural language processing

Leitfaden für die Antragstellung zur Förderung einer nationalen Biomaterialbankeninitiative

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

HYPO TIROL BANK AG. EUR 5,750,000,000 Debt Issuance Programme (the "Programme")

Translation Solution for

Kapitel 2 Unternehmensarchitektur III

Adapting General Models to Novel Project Ideas

GETTING FEEDBACK REALLY FAST WITH DESIGN THINKING AND AGILE SOFTWARE ENGINEERING

Cross-linguistic differences in the interpretation of sentences with more than one QP: German (Frey 1993) and Hungarian (É Kiss 1991)

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013

How To Analyse A Flugzeugflugfl\U00Fcgels In 3D Cda

ANALYSIS MARKETING AND FORECASTING RICE PRICE IN THE MEKONG DELTA OF VIETNAM

Exchange Synchronization AX 2012

German spelling. upper and lower case theory. upper and lower case. summary. rule G04: Adjectives rule G06: Noun+adjective rule G07: Verbs

(51) Int Cl.: H04L 29/06 ( ) H04L 12/26 ( ) H04M 3/22 ( ) H04M 7/00 ( )

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Integration of Content Optimization Software into the Machine Translation Workflow. Ben Gottesman Acrolinx

The TCH Machine Translation System for IWSLT 2008

FOR TEACHERS ONLY The University of the State of New York

APPLICATION SETUP DOCUMENT

1.2. Chemicals, insecticidal or germicidal preparations for domestic and medical use.

Vandebrouck, Monaghan, Lagrange

Hybrid Evolution of Heterogeneous Neural Networks

Microsoft Certified IT Professional (MCITP) MCTS: Windows 7, Configuration ( )

Wolkige Versprechungen - Freiraum mit Tuecken

LEHMAN BROTHERS SECURITIES N.V. LEHMAN BROTHERS (LUXEMBOURG) EQUITY FINANCE S.A.

COPYRIGHT: NACHGEDRUCKT ODER SONST REPRODUZIERT WERDEN (Z. B. AUF CD-ROM ODER IM INTERNET).

Coffee Break German. Lesson 03. Study Notes. Coffee Break German: Lesson 03 - Notes page 1 of 15

Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation

Adaptation to Hungarian, Swedish, and Spanish

Chapter 6. Decoding. Statistical Machine Translation

AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

Comprendium Translator System Overview

ĐẠI HỘI LINH MỤC VIỆT NAM HÀNH TRÌNH EMMAUS VI VIETNAMESE PRIESTS CONVOCATION EMMAUS VI - ATLANTA 2015

GCE EXAMINERS' REPORTS. GERMAN AS/Advanced

How To Manage Build And Release With Tfs 2013

Teacher education and its internationalisation LATVIA

TIn 1: Lecture 3: Lernziele. Lecture 3 The Belly of the Architect. Basic internal components of the Pointers and data storage in memory

Is Cloud relevant for SOA? Corsin Decurtins

INFORMATIONEN (nicht per )

German Language Resource Packet

Software / FileMaker / Plug-Ins Mailit 6 for FileMaker 10-13

The finite verb and the clause: IP

LINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

German Language Support Package

Upgrading Your Skills to MCSA Windows Server 2012 MOC 20417

Berufsakademie Mannheim University of Co-operative Education Department of Information Technology (International)

Multi-Task Learning for Multiple Language Translation

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Transcription:

Neural Machine Transla/on for Spoken Language Domains Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Neural Machine Transla/on (NMT) End- to- end neural approach to MT: Simple and coherent. Achieved state- of- the- art WMT results: English- French: (Luong et al., 2015a). English- German: (Jean et al., 2015a, Luong et al., 15b). English- Czech: (Jean et al., 2015b). Not much work explores NMT for spoken language domains.

Outline A quick introducoon to NMT. Basics. AQenOon mechanism. Our work in IWSLT. We need to understand Recurrent Neural Networks first!

Recurrent Neural Networks (RNNs) input: I am a student (Picture adapted from Andrej Karparthy)

Recurrent Neural Networks (RNNs) h t- 1 h t input: I am a student x t RNNs to represent sequences! (Picture adapted from Andrej Karparthy)

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant Model P(target source) directly.

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014).

Neural Machine Transla/on (NMT) Je suis étudiant _ Encoder Decoder I am a student _ Je suis étudiant RNNs trained end- to- end (Sutskever et al., 2014). Encoder- decoder approach.

Training vs. Tes/ng Je suis étudiant _ Training Correct translaoons are available. I am a student _ Je suis étudiant Je suis étudiant _ Tes8ng Only source sentences are given. I am a student _ Je suis étudiant

Recurrent types vanilla RNN RNN Vanishing gradient problem!

Recurrent types LSTM C mon, it s been around for 20 years! LSTM LSTM cells Long- Short Term Memory (LSTM) (Hochreiter & Schmidhuber, 1997) LSTM cells are addiovely updated Make backprop through Ome easier.

Summary NMT Few linguisoc assumpoons. Simple beam- search decoders. Good generalizaoon to long sequences.

Outline A quick introducoon to NMT. Basics. AGen/on mechanism. Our work in IWSLT.

Sentence Length Problem Without aqenoon With aqenoon (Bahdanau et al., 2015) Problem: sentence meaning is represented by a fixed- dimensional vector.

AGen/on Mechanism Je suis étudiant _ Pool of source states I am a student _ Je suis étudiant SoluOon: random access memory Retrieve as needed.

AGen/on Mechanism Attention Layer suis Context vector? I am a student _ Je

AGen/on Mechanism Scoring Attention Layer suis Context vector 3? I am a student _ Je Compare target and source hidden states.

AGen/on Mechanism Scoring Attention Layer suis Context vector 3 5? I am a student _ Je Compare target and source hidden states.

AGen/on Mechanism Scoring Attention Layer suis Context vector 3 5 1? I am a student _ Je Compare target and source hidden states.

AGen/on Mechanism Scoring Attention Layer suis Context vector 3 5 1 1? I am a student _ Je Compare target and source hidden states.

AGen/on Mechanism Normaliza0on Attention Layer suis Context vector 0.3 0.5 0.1 0.1? I am a student _ Je Convert into alignment weights.

AGen/on Mechanism Context vector Context vector suis? I am a student _ Je Build context vector: weighted average.

AGen/on Mechanism Hidden state Context vector suis I am a student _ Je Compute the next hidden state.

Alignments as a by- product (Bahdanau et al., 2015)

Summary A:en0on A random access memory. Help translate long sentences. Produce alignments.

Outline A quick introducoon to NMT. Our work in IWSLT: Models. NMT adaptaoon. NMT for low- resource translaoon.

Models AQenOon- based models (Luong et al., 2015b): Global & local aqenoon. Global: all source states. Local: subset of source states. Train both types of models to ensemble later.

NMT Adapta/on Large data (WMT) Small data (IWSLT) Can we adapt exisong models?

Exis/ng models State- of- the- art English German NMT system Trained on WMT data (4.5M sent pairs) Tesla K40, 7-10 days. Ensemble of 8 models (Luong et al., 2015b). Global / local aqenoon +/- dropout. Source reversing. 4 LSTM layers, 1000 dimensions. 50K top frequent words.

Adapta/on Further train on IWSLT data. 200K sentence pairs. 12 epochs with SGD: 3-5 hours on GPU. Same sekngs: Source reversing. 4 LSTM layers, 1000 dimensions. Same vocab: 50K top frequent words. Would be useful to update vocab!

Results TED tst2013 Systems IWSLT 14 best entry (Freitag et al., 2014) Our NMT systems Single NMT (unadapted) BLEU 26.2 25.6

Results TED tst2013 Systems IWSLT 14 best entry (Freitag et al., 2014) Our NMT systems Single NMT (unadapted) Single NMT (adapted) BLEU 26.2 25.6 29.4 (+3.8) New SOTA! AdaptaOon is effecove.

Results TED tst2013 Systems IWSLT 14 best entry (Freitag et al., 2014) Our NMT systems Single NMT (unadapted) Single NMT (adapted) Ensemble NMT (adapted) BLEU 26.2 25.6 29.4 (+3.8) 31.4 (+2.0) Even beqer!

English German Evalua/on Results Systems tst2014 tst2015 IWSLT 14 best entry (Freitag et al., 2014) 23.3 IWSLT 15 baseline 18.5 20.1

English German Evalua/on Results Systems tst2014 tst2015 IWSLT 14 best entry (Freitag et al., 2014) 23.3 IWSLT 15 baseline 18.5 20.1 Our NMT ensemble 27.6 (+9.1) 30.1 (+10.0) New SOTA! NMT generalizes well!

Sample English- German transla/ons src ref unadapt adapted We desperately need great communica/on from our scienosts and engineers in order to change the world. Wir brauchen unbedingt großar/ge Kommunika/on von unseren Wissenschanlern und Ingenieuren, um die Welt zu verändern. Wir benöogen dringend eine große MiGeilung unserer Wissenschanler und Ingenieure, um die Welt zu verändern. Wir brauchen dringend eine großar/ge Kommunika/on unserer Wissenschanler und Ingenieure, um die Welt zu verändern. Adapted models are beqer.

Sample English- German transla/ons src ref unadapt adapted best We desperately need great communicaoon from our scien/sts and engineers in order to change the world. Wir brauchen unbedingt großaroge KommunikaOon von unseren Wissenscha\lern und Ingenieuren, um die Welt zu verändern. Wir benöogen dringend eine große MiQeilung unserer Wissenscha\ler und Ingenieure, um die Welt zu verändern. Wir brauchen dringend eine großaroge KommunikaOon unserer Wissenscha\ler und Ingenieure, um die Welt zu verändern. Wir brauchen dringend eine großaroge KommunikaOon von unseren Wissenscha\lern und Ingenieuren, um die Welt zu verändern. Ensemble models are best. Correctly translate the plural noun scien/sts.

Sample English- German transla/ons src ref base Yeah. Yeah. So what will happen is that, during the call you have to indicate whether or not you have the disease or not, you see. Right. Was passiert ist, dass der PaOent während des Anrufes angeben muss, ob diese Person an Parkinson leidet oder nicht. Ok. Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja dass dass dass. adapted best Ja. Ja. Es wird also passieren, dass man während des Gesprächs angeben muss, ob man krank ist oder nicht. RichOg. Ja. Ja. Was passiert, ist, dass Sie während des zu angeben müssen, ob Sie die Krankheit haben oder nicht, oder nicht. RichOg. Unadapted models screwed up.

Sample English- German transla/ons src ref base Yeah. Yeah. So what will happen is that, during the call you have to indicate whether or not you have the disease or not, you see. Right. Was passiert ist, dass der PaOent während des Anrufes angeben muss, ob diese Person an Parkinson leidet oder nicht. Ok. Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja dass dass dass. adapted best Ja. Ja. Es wird also passieren, dass man während des Gesprächs angeben muss, ob man krank ist oder nicht. RichOg. Ja. Ja. Was passiert, ist, dass Sie während des zu angeben müssen, ob Sie die Krankheit haben oder nicht, oder nicht. RichOg. Adapted models produce more reliable translaoons.

Outline A quick introducoon to NMT. Our work in IWSLT: Models. NMT adaptaoon. NMT for low- resource transla/on.

NMT for low- resource transla/on So far, NMT systems have been trained on large WMT data: English- French: 12-36M sentence pairs. English- German: 4.5M sentence pairs. Not much work uolizes small corpora: (Gülçehre et al., 2015): IWSLT Turkish English, but use large English monolingual data. Train English Vietnamese systems

Setup Train English Vietnamese models from scratch: 133K sentence pairs: Moses tokenizer, true case. Words occur at least 5 Omes. 17K English words & 7.7K Vietnamese words. Use smaller networks: 2 LSTM layers, 500 dimensions. Tesla K40, 4-7 hours on GPU. Ensemble of 9 models: Global / local aqenoon +/- dropout.

English Vietnamese Results BLEU Single NMT Ensemble NMT Systems tst2013 23.3 26.9 IWSLT 15 baseline Our system Systems tst2015 27.0 26.4 Results are compeoove.

Latest Results tst2015! We score top in TER! ObservaOon by (Neubig et al., 2015): NMT is good at gekng the syntax right, not much about lexical choices.

Sample English- Vietnamese transla/ons src ref single mulo However, Gaddafi len behind a heavy burden, a legacy of tyranny, corrupoon and seeds of diversions. Tuy nhiên, Gaddafi đã để lại một gánh nặng, một di sản của chính thể chuyên chế, tham nhũng và những mầm mống chia rẽ. Tuy nhiên, Gaddafi đằng sau gánh nặng nặng nề, một di sản di sản, tham nhũng và hạt giống. Tuy nhiên, Gaddafi bỏ lại sau một gánh nặng nặng nề, một di sản của sự chuyên chế, tham nhũng và hạt giống. Ensemble models are beqer.

Sample English- Vietnamese transla/ons src ref single mulo From 1971 to 1977 - I look young, but I'm not - I worked in Zambia, Kenya, Ivory Coast, Algeria, Somalia, in projects of technical coopera/on with African countries. Từ năm 1971 đến 1977- Trông tôi trẻ thế chứ không phải vậy đâu - Tôi đã làm việc tại Zambia, Kenya, Ivory Coast, Algeria, Somalia, trong những dự án hợp tác về kỹ thuật với những quốc gia Châu Phi Từ 1971 đến năm 1977, tôi còn trẻ, nhưng tôi không - tôi làm việc ở Zambia, Kenya, Bờ Biển Ngà, Somalia, Somalia, trong các dự án về kỹ thuật Từ 1971 đến năm 1977 - tôi trông trẻ, nhưng tôi không - Tôi làm việc ở Zambia, Kenya, Bờ Biển Ngà, Algeria, Somalia, trong các dự án hợp tác với các nước châu Phi. Ensemble models are beqer.

Sample English- Vietnamese transla/ons src ref single mulo From 1971 to 1977 - - I look young, but I'm not - - - - I worked in Zambia, Kenya, Ivory Coast, Algeria, Somalia, in projects of technical cooperaoon with African countries. Từ năm 1971 đến 1977- - Trông tôi trẻ thế chứ không phải vậy đâu - - - - Tôi đã làm việc tại Zambia, Kenya, Ivory Coast, Algeria, Somalia, trong những dự án hợp tác về kỹ thuật với những quốc gia Châu Phi Từ 1971 đến năm 1977, tôi còn trẻ, nhưng tôi không - - - - tôi làm việc ở Zambia, Kenya, Bờ Biển Ngà, Somalia, Somalia, trong các dự án về kỹ thuật Từ 1971 đến năm 1977 - - tôi trông trẻ, nhưng tôi không - - Tôi làm việc ở Zambia, Kenya, Bờ Biển Ngà, Algeria, Somalia, trong các dự án hợp tác với các nước châu Phi. Sensible name translaoons.

Thank you! Conclusion NMT in spoken language domains: Domain adaptaoon Low- resource translaoon. Domain adaptaoon is useful New SOTA results for English- German. Low- resource translaoon CompeOOve results for English- Vietnamese.