Computer Aided Translation



Similar documents
Machine Translation and the Translator

Introduction. Philipp Koehn. 28 January 2016

Convergence of Translation Memory and Statistical Machine Translation

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Chapter 5. Phrase-based models. Statistical Machine Translation

TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR CORD 01 5

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

Collaborative Machine Translation Service for Scientific texts

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

A Joint Sequence Translation Model with Integrated Reordering

Glossary of translation tool types

Extracting translation relations for humanreadable dictionaries from bilingual text

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Statistical Machine Translation

Statistical Machine Translation

Empirical Machine Translation and its Evaluation

Machine Translation at the European Commission

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Collecting Polish German Parallel Corpora in the Internet

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Question template for interviews

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

PROMT Technologies for Translation and Big Data

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Machine Translation for Human Translators

Pattern Insight Clone Detection

Machine translation techniques for presentation of summaries

Language and Computation

Domain Classification of Technical Terms Using the Web

TS3: an Improved Version of the Bilingual Concordancer TransSearch

Machine Translation. Agenda

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

Getting Off to a Good Start: Best Practices for Terminology

ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY: A COST/BENEFIT ANALYSIS by Lynn E. Webb BA, San Francisco State University, 1992 Submitted in

Learning Today Smart Tutor Supports English Language Learners

Application of Machine Translation in Localization into Low-Resourced Languages

How To Write The English Language Learner Can Do Booklet

Chapter 6. Decoding. Statistical Machine Translation

Anubis - speeding up Computer-Aided Translation

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

The Principle of Translation Management Systems

TAPTA: Translation Assistant for Patent Titles and Abstracts

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

THUTR: A Translation Retrieval System

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Translating the Penn Treebank with an Interactive-Predictive MT System

(Refer Slide Time: 2:03)

Translation and Localization Services

Describe the process of parallelization as it relates to problem solving.

Text Mining - Scope and Applications

KantanMT.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

LDIF - Linked Data Integration Framework

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

How can public institutions benefit from automated translation infrastructure?

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

A Joint Sequence Translation Model with Integrated Reordering

Final Review Ch. 1 #2

White Paper. Translation Quality - Understanding factors and standards. Global Language Translations and Consulting, Inc. Author: James W.

Why? A central concept in Computer Science. Algorithms are ubiquitous.

LANGUAGE TRANSLATION SOFTWARE EXECUTIVE SUMMARY Language Translation Software Market Shares and Market Forecasts Language Translation Software Market

TAUS Quality Dashboard. An Industry-Shared Platform for Quality Evaluation and Business Intelligence September, 2015

FIRST CERTIFICATE Reading and Use of English Writing Listening Speaking Reading:

Translation Solution for

Statistical Machine Translation for Automobile Marketing Texts

Why are Organizations Interested?

Teaching Reading Essentials:

CHAPTER 15: IS ARTIFICIAL INTELLIGENCE REAL?

Recovering Business Rules from Legacy Source Code for System Modernization

Quantifying the Influence of MT Output in the Translators Performance: A Case Study in Technical Translation

Transcription:

Computer Aided Translation Philipp Koehn 30 April 2015

Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority of research Communication participants don t speak same language, rely on translation users can ask questions, when something is unclear chat room translations, hand-held devices often combined with speech recognition Dissemination publisher wants to make content available in other languages high demands for quality currently almost exclusively done by human translators

Why Machine Translation? 2 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority of research Communication participants don t speak same language, rely on translation users can ask questions, when something is unclear chat room translations, hand-held devices often combined with speech recognition Dissemination publisher wants to make content available in other languages high demands for quality OUR currently almost exclusively done by human translatorsfocus

Goal: Helping Human Translators 3 If you can t beat them, join them. How can machine translation help human translators?

Post-Editing Machine Translation 4 (source: Autodesk)

Machine Translation Quality Matters 5 Experiment: Post-editing with different machine translation systems English German, news stories UEDIN SYNTAX ONLINE B UEDIN PHRASE OTHER WMT13 5.38 sec/word 5.46 sec/word 5.45 sec/word 6.35 sec/word 0 sec/word 2 sec/word 4 sec/word 6 sec/word [Koehn and Germann, 2014]

Overview 6 Interactivity Choices Confidence Adaptation

Interactivity 7 Traditional professional translation approaches translation from scratch post-editing translation memory match post-editing machine translation output More interactive collaboration between machine and professional?

Interactive Machine Translation 8 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator

Interactive Machine Translation 9 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He

Interactive Machine Translation 10 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He has

Interactive Machine Translation 11 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He has for months

Interactive Machine Translation 12 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned

Interactive Machine Translation 13 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned for months

Prediction from Search Graph 14 planned for months he has for months it has since months Search for best translation creates a graph of possible translations

Prediction from Search Graph 15 planned for months he has for months it has since months One path in the graph is the best (according to the model) This path is suggested to the user

Prediction from Search Graph 16 planned for months he has for months it has since months The user may enter a different translation for the first words We have to find it in the graph

Prediction from Search Graph 17 planned for months he has for months it has since months We can predict the optimal completion (according to the model)

time 80ms Run Time 18 72ms 64ms 56ms 48ms 7 edits 8 edits 56 edits 4 edits 40ms 32ms 3 edits 24ms 16ms 8ms 0ms 2 edits 1 edit 0 edits 5 10 15 20 25 30 35 prefix 40

Word Alignment Visualization 19 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned for months to give a lecture in Baltimore in

Word Alignment Visualization 20 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned for months to give a lecture in Baltimore in

Shading off Translated Material 21 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned for months to give a lecture in Baltimore in

Some Observations 22 How can we do this? word alignments by-product of matching against search braph automatic word alignments (as used in training) User feedback users like interactive machine translation... but they may be slower than with post-editing machine translation user like mouse-over word alignment highlighting user do not like at-cursor word alignment highlighting

Overview 23 Interactivity Choices Confidence Adaptation

Choices 24 Trigger the passive vocabulary Display multiple translations for words and phrases er hat seit Monaten geplant, im März einen Vortrag... he has for months the plan in March a lecture... it has for months now planned, in March a presentation... he was for several months planned to in the March a speech... he has made since months the pipeline in March of a statement... he did for many months scheduled the March a general... Rank and color-highlight by probability of each translation Prefer diversity

Alternative Translations 25 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Professional Translator He planned for months to give a lecture in Baltimore in April. give a presentation present his work give a speech speak User requests alternative translations for parts of sentence.

Bilingual Concordancer 26

Overview 27 Interactivity Choices Confidence Adaptation

Confidence 28 Machine translation engine indicates where it is likely wrong (also known as quality estimation Lucia Specia) Different Levels of granularity document-level (SDL s TrustScore ) sentence-level word-level What are we predicting? how useful is the translation on a scale of (say) 1 5 indication if post-editing is worthwhile estimation of post-editing effort pin-pointing errors

Sentence-Level Confidence 29 Translators are used to Fuzzy Match Score used in translation memory systems roughly: ratio of words that are the same between input and TM source if less than 70%, then not useful for post-editing We would like to have a similar score for machine translation Even better estimation of post-editing time estimation of from-scratch translation time can also be used for pricing Active research question, see also shared task at WMT 2013

Word-Level Confidence 30 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Machine Translation He has for months planned in April give a lecture in Baltimore.

Word-Level Confidence 31 Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Machine Translation He has for months planned in April give a lecture in Baltimore. Note: different color for wrong words and reordered words (inserted words? missing words?)

Automatic Reviewing 32 Can we identify errors in human translations? missing / added information inconsistent use of terminology Input Sentence Er hat seit Monaten geplant, im April einen Vortrag in Baltimore zu halten. Human Translation Moreover, he planned for months to give a lecture in Baltimore.

Overview 33 Interactivity Choices Confidence Adaptation

Adaptation 34 Machine translation works best if optimized for domain Typically, large amounts of out-of-domain data available European Parliament, United Nations unspecified data crawled from the web Little in-domain data (maybe 1% of total) information technology data more specific: IBM s user manuals even more specific: IBM s user manual for same product line from last year and even more specific: sentence pairs from current project Various domain adaptation techniques researched and used

Incremental Updating 35 source text translate MT engine MT translation re-train post-edit human translation

Incremental Updating 36 source text translate MT engine MT translation re-train post-edit human translation

Incremental Updating 37 source text translate MT engine MT translation re-train post-edit human translation

Updatable Translation Table 38 Required: quickly add a sentence pair to the translation table word alignment phrase extraction phrase table building Online word alignment use existing model to align new sentence pair Online phrase table building Store parallel corpus in memory Index corpus with suffix array Extract phrases on the fly This can be done in less than 1 second

Suffixes 39 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 3 the people, by the people, for the people 4 people, by the people, for the people 5, by the people, for the people 6 by the people, for the people 7 the people, for the people 8 people, for the people 9, for the people 10 for the people 11 the people 12 people

Sorted Suffixes 40 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people

Suffix Array 41 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people suffix array: sorted index of corpus positions

Querying the Suffix Array 42 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people

Querying the Suffix Array 43 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people Binary search: start in the middle

Querying the Suffix Array 44 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people Binary search: discard upper half

Querying the Suffix Array 45 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people Binary search: middle of remaining space

Querying the Suffix Array 46 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people Binary search: match

Querying the Suffix Array 47 5, by the people, for the people 9, for the people 6 by the people, for the people 10 for the people 1 government of the people, by the people, for the people 2 of the people, by the people, for the people 12 people 4 people, by the people, for the people 8 people, for the people 11 the people 3 the people, by the people, for the people 7 the people, for the people Query: people Finding matching range with additional binary searches for start and end

Estimating Probablities 48 Phrase translation probability p(ē f) = count(ē, f) count( f) Suffix array allows quick estimation of count( f) By looping through the f, the count(ē, f) can be collected as well If there are too many f, we can resort to sampling