Statistical Machine Translation



Similar documents
Computer Aided Translation

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Statistical Machine Translation

A Joint Sequence Translation Model with Integrated Reordering

Overview of MT techniques. Malek Boualem (FT)

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Introduction. Philipp Koehn. 28 January 2016

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Hybrid Strategies. for better products and shorter time-to-market

Chapter 5. Phrase-based models. Statistical Machine Translation

The history of machine translation in a nutshell

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

Question template for interviews

Integration of Content Optimization Software into the Machine Translation Workflow. Ben Gottesman Acrolinx

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Extracting translation relations for humanreadable dictionaries from bilingual text

Machine Translation at the European Commission

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

Convergence of Translation Memory and Statistical Machine Translation

C E D A T 8 5. Innovating services and technologies for speech content management

Teacher education and its internationalisation LATVIA

Customizing an English-Korean Machine Translation System for Patent Translation *

Getting Off to a Good Start: Best Practices for Terminology

LANGUAGE TRANSLATION SOFTWARE EXECUTIVE SUMMARY Language Translation Software Market Shares and Market Forecasts Language Translation Software Market

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Machine vs. Human Translation Scott Bass, Advanced Language Translation Inc.

NAPCS Product List for NAICS 54193: Translation and Interpretation Services

M LTO Multilingual On-Line Translation

Hybrid Machine Translation Guided by a Rule Based System

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Creating Captions in YouTube

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Translation Solution for

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

Chapter 6. Decoding. Statistical Machine Translation

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Empirical Machine Translation and its Evaluation

RemoSync Business (Brew MP)

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8

Translation and Localization Services

Glossary of translation tool types

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

Applications of speech-to-text in customer service. Dr. Joachim Stegmann Deutsche Telekom AG, Laboratories

SDL Trados Studio 2015 Translation Memory Management Quick Start Guide

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

Central Release and Build Management with TFS. Christian Schlag

Comprendium Translator System Overview

REALIZATION SORTING ALGORITHM USING PARALLEL TECHNOLOGIES bachelor, Mikhelev Vladimir candidate of Science, prof., Sinyuk Vasily

SDL Trados Studio 2015 Project Management Quick Start Guide

COPYRIGHT 2011 COPYRIGHT 2012 AXON DIGITAL DESIGN B.V. ALL RIGHTS RESERVED

THUTR: A Translation Retrieval System

Automated Lecture Transcription

Military Community and Family Policy Social Media. Guide. Staying Connected

PROMT Technologies for Translation and Big Data

Tanja Wissik. COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm

AP GERMAN LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

ICT Project on Text Transcription of Technical Video Lectures and Creation of Video Searchable Index, Metadata and Online Quizzes

IsumaTV. Media Player Setup Manual COOP Cable System. Media Player

Thomas Rümmler AIT GmbH & Co. KG Christian Schlag AIT GmbH & Co. KG. Central Build and Release Management with TFS

Machine Translation and the Translator

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

How to become a successful language learner

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

Product Quality and Environmental Standards: The Effect of an International Environmental Agreement on Tropical Timber Trade

Clarity High School Student Survey

Website Design Application Form

Collaborative Machine Translation Service for Scientific texts

A TYPICAL DAY WITH OFFICE 365. Office 365 is so much more than just an online version of and Office 2013.

32 Benefits of Pipeliner CRM

SYSTRAN Enterprise Server 7 Online Tools User Guide. Chapter 1: Overview... 1 SYSTRAN Enterprise Server 7 Overview... 2

KantanMT.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

MY MOTHER TEOFILA REICH-RANICKI Andrew Ranicki Edinburgh German Circle, 28th February, 2012

How To Manage Build And Release With Tfs 2013

Maskinöversättning F2 Översättningssvårigheter + Översättningsstrategier

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

Mit einem Auge auf den mathema/schen Horizont: Was der Lehrer braucht für die Zukun= seiner Schüler

Year 7. Home Learning Booklet. French and German Skills

> How it works FAQ - TV SYNCED ADS. 1. TV-Synced Ads : Ok, but what is it exactly? 2. Why is TV-Synced ads relevant?

Transcription:

Statistical Machine Translation What works and what does not Andreas Maletti Universität Stuttgart maletti@ims.uni-stuttgart.de Stuttgart May 14, 2013 Statistical Machine Translation A. Maletti 1

Main notions Machine translation (MT) Automatic natural language translation as opposed to: manual translation computer-aided translation (by a computer) (e.g., translation memory) Statistical machine translation (SMT) MT using systems automatically obtained from (many) translations as opposed to: rule-based machine translation example-based machine translation (old) SYSTRAN translation by analogy Statistical Machine Translation A. Maletti 2

Main notions Machine translation (MT) Automatic natural language translation as opposed to: manual translation computer-aided translation (by a computer) (e.g., translation memory) Statistical machine translation (SMT) MT using systems automatically obtained from (many) translations as opposed to: rule-based machine translation example-based machine translation (old) SYSTRAN translation by analogy Statistical Machine Translation A. Maletti 2

Short history Timeline 1 Dark age (60s 90s) rule-based systems (e.g., SYSTRAN) CHOMSKYAN approach perfect translation, poor coverage 2 Reformation (1991 present) phrase-based and syntax-based systems statistical approach cheap, automatically trained 3 Potential future semantics-based systems (e.g., FRAMENET-based) semi-supervised, statistical approach basic understanding of translated text Statistical Machine Translation A. Maletti 3

Short history Timeline 1 Dark age (60s 90s) rule-based systems (e.g., SYSTRAN) CHOMSKYAN approach perfect translation, poor coverage 2 Reformation (1991 present) phrase-based and syntax-based systems statistical approach cheap, automatically trained 3 Potential future semantics-based systems (e.g., FRAMENET-based) semi-supervised, statistical approach basic understanding of translated text Statistical Machine Translation A. Maletti 3

Short history Timeline 1 Dark age (60s 90s) rule-based systems (e.g., SYSTRAN) CHOMSKYAN approach perfect translation, poor coverage 2 Reformation (1991 present) phrase-based and syntax-based systems statistical approach cheap, automatically trained 3 Potential future semantics-based systems (e.g., FRAMENET-based) semi-supervised, statistical approach basic understanding of translated text Statistical Machine Translation A. Maletti 3

Standard pipeline Schema Input Translation model Language model Output (the models are often integrated in practice) Required resources bilingual text (sentences in both languages) monolingual text (in target language) 1.5M sent. 44M sent. Statistical Machine Translation A. Maletti 4

Standard pipeline Schema Input Translation model Language model Output (the models are often integrated in practice) Required resources bilingual text (sentences in both languages) monolingual text (in target language) 1.5M sent. 44M sent. Statistical Machine Translation A. Maletti 4

Standard pipeline Schema Input Translation model Language model Output (the models are often integrated in practice) Required resources bilingual text (sentences in both languages) monolingual text (in target language) 1.5M sent. 44M sent. Statistical Machine Translation A. Maletti 4

Standard pipeline Example (Source: GOOGLE translate) Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert Was am Was funktioniert am und was nicht und was nicht funktioniert und welche nicht ist und was nicht Statistical Machine Translation A. Maletti 5

Standard pipeline Example (Source: GOOGLE translate) Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert Was am Was funktioniert am und was nicht und was nicht funktioniert und welche nicht ist und was nicht Statistical Machine Translation A. Maletti 5

Standard pipeline Example (Source: GOOGLE translate) Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert Was am Was funktioniert am und was nicht und was nicht funktioniert und welche nicht ist und was nicht Statistical Machine Translation A. Maletti 5

Standard pipeline Example (Source: GOOGLE translate) Input: What works and what does not Segmentation: What works and what does not Translation model output: Was funktioniert Was am Was funktioniert am und was nicht und was nicht funktioniert und welche nicht ist und was nicht Statistical Machine Translation A. Maletti 5

Phrase-based machine translation And then the matter was decided, and everything was put in place f kan An tm AlHsm w wdet Almwr fy nsab ha Extracted information Segmentation: And then the matter was decided, and everything was put in place Phrase translation: Reordering: Statistical Machine Translation A. Maletti 6

Phrase-based machine translation And then the matter was decided, and everything was put in place f kan An tm AlHsm w wdet Almwr fy nsab ha Extracted information Segmentation: And then the matter was decided, and everything was put in place Phrase translation: Reordering: Statistical Machine Translation A. Maletti 6

Phrase-based machine translation And then the matter was decided, and everything was put in place f kan An tm AlHsm w wdet Almwr fy nsab ha Extracted information Segmentation: And then 1 the matter 2 was decided 3, and everything 4 was put 5 in place 6 Phrase translation: Reordering: Statistical Machine Translation A. Maletti 6

Phrase-based machine translation And then the matter was decided, and everything was put in place f kan An tm AlHsm w wdet Almwr fy nsab ha Extracted information Segmentation: And then 1 the matter 2 was decided 3, and everything 4 was put 5 in place 6 Phrase translation: f kan 1 Almwr 2 An tm AlHsm 3 w 4 wdet 5 fy nsab ha 6 Reordering: Statistical Machine Translation A. Maletti 6

Phrase-based machine translation And then the matter was decided, and everything was put in place f kan An tm AlHsm w wdet Almwr fy nsab ha Extracted information Segmentation: And then 1 the matter 2 was decided 3, and everything 4 was put 5 in place 6 Phrase translation: f kan 1 Almwr 2 An tm AlHsm 3 w 4 wdet 5 fy nsab ha 6 Reordering: (1 3 4 5 2 6) Statistical Machine Translation A. Maletti 6

How it works Technical talks Marion Weller Daniel Quernheim and Nina Seemann phrase-based MT syntax-based MT Statistical Machine Translation A. Maletti 7

Small players Research at IMS Phrase-based MT (head: Dr. Alexander Fraser) Fabienne Braune Fabienne Cap Anita Ramm Marion Weller Syntax-based MT (head: Dr. Andreas Maletti) Fabienne Braune Daniel Quernheim Nina Seemann Statistical Machine Translation A. Maletti 8

Small players Research at IMS Phrase-based MT (head: Dr. Alexander Fraser) Fabienne Braune Fabienne Cap Anita Ramm Marion Weller Syntax-based MT (head: Dr. Andreas Maletti) Fabienne Braune Daniel Quernheim Nina Seemann Statistical Machine Translation A. Maletti 8

Small players Research at IMS Phrase-based MT (head: Dr. Alexander Fraser) Fabienne Braune Fabienne Cap Anita Ramm Marion Weller Syntax-based MT (head: Dr. Andreas Maletti) Fabienne Braune Daniel Quernheim Nina Seemann Statistical Machine Translation A. Maletti 8

Big players Commercial systems Language Studio GOOGLE translate WebSphere Translation Server BING translator OMNIFLUENT... Statistical Machine Translation A. Maletti 9

Big players Commercial systems Language Studio GOOGLE translate WebSphere Translation Server BING translator OMNIFLUENT... Soon also Statistical Machine Translation A. Maletti 9

Failures Statistical Machine Translation A. Maletti 10

Failures Applications Technical manuals Example (An mp3 player) The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top, keep etc. edit function. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals Example (An mp3 player) The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top, keep etc. edit function. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals Example (An mp3 player) The synchronous manifestation of lyrics is a procedure for can broadcasting the music, waiting the mp3 file at the same time showing the lyrics. With the this kind method that the equipments that synchronous function of support up broadcast to make use of document create setup, you can pass the LCD window way the check at the document contents that broadcast. That procedure returns offerings to have to modify, and delete, and stick top, keep etc. edit function. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals Example (Hotel Uppsala, Sweden) Wir hatten die Zimmer eingestuft wird als Superior weil sie renoviert wurde im letzten Jahr oder zwei. Unsere Zimmer hatten Parkettboden und waren sehr geräumig. Man musste allerdings nicht musste seitwärts bewegen. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals Example (Hotel Uppsala, Sweden) Wir hatten die Zimmer eingestuft wird als Superior weil sie renoviert wurde im letzten Jahr oder zwei. Unsere Zimmer hatten Parkettboden und waren sehr geräumig. Man musste allerdings nicht musste seitwärts bewegen. We stayed in rooms classified as superior because they had been renovated in the last year or two. Our rooms had wood floors and were roomy. You didn t have to walk sideways to move around. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals US military Example (JONES, SHEN, HERZOG 2009) Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran. Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals US military Example (JONES, SHEN, HERZOG 2009) Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran. Speech-to-text machine translation Soldier: Local: Okay, what s your name? milk a mechanic and I am here I mean yes Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals US military Example (JONES, SHEN, HERZOG 2009) Soldier: Okay, what is your name? Local: Abdul. Soldier: And your last name? Local: Al Farran. Speech-to-text machine translation Soldier: Local: Soldier: Local: Okay, what s your name? milk a mechanic and I am here I mean yes What is your last name? every two weeks my son s name is ismail Statistical Machine Translation A. Maletti 11

Failures Applications Technical manuals US military MSDN, Knowledge Base... Statistical Machine Translation A. Maletti 11

But in many cases it actually works... Statistical Machine Translation A. Maletti 12

Selected application Lecture translation real-time speech-to-text machine translation combines automatic speech recognition and SMT requires lecturer training and terminology training automatically provides subtitles to lecture video Video http://www.youtube.com/watch?v=x5ll0wpr-88 Statistical Machine Translation A. Maletti 13

Selected application Lecture translation real-time speech-to-text machine translation combines automatic speech recognition and SMT requires lecturer training and terminology training automatically provides subtitles to lecture video Video http://www.youtube.com/watch?v=x5ll0wpr-88 Statistical Machine Translation A. Maletti 13

Summary SMT works well between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) access to foreign language SMT could be better into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) precision / translation accuracy Conclusion SMT is a cheap way to access foreign material Statistical Machine Translation A. Maletti 14

Summary SMT works well between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) access to foreign language SMT could be better into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) precision / translation accuracy Conclusion SMT is a cheap way to access foreign material Statistical Machine Translation A. Maletti 14

Summary SMT works well between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) access to foreign language SMT could be better into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) precision / translation accuracy Conclusion SMT is a cheap way to access foreign material Statistical Machine Translation A. Maletti 14

Summary SMT works well between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) access to foreign language SMT could be better into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) precision / translation accuracy Conclusion SMT is a cheap way to access foreign material Statistical Machine Translation A. Maletti 14

Summary SMT works well between similar languages (e.g., Spanish-English) between large resource languages (e.g., French-English) in-domain (training and test from the same domain) access to foreign language SMT could be better into morphologically rich / free word order languages (e.g., German) handling noisy inputs (e.g., chats, Twitter feeds) dealing with documents (instead of sentences) precision / translation accuracy Conclusion SMT is a cheap way to access foreign material Statistical Machine Translation A. Maletti 14