HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN



Similar documents
A Joint Sequence Translation Model with Integrated Reordering

Hybrid Machine Translation Guided by a Rule Based System

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Chapter 5. Phrase-based models. Statistical Machine Translation

Machine Translation at the European Commission

Empirical Machine Translation and its Evaluation

Collaborative Machine Translation Service for Scientific texts

Hybrid Strategies. for better products and shorter time-to-market

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Convergence of Translation Memory and Statistical Machine Translation

IRIS - English-Irish Translation System

A Joint Sequence Translation Model with Integrated Reordering

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

Computer Aided Translation

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Integration of Content Optimization Software into the Machine Translation Workflow. Ben Gottesman Acrolinx

Search Engines Chapter 2 Architecture Felix Naumann

Report on the embedding and evaluation of the second MT pilot

Factored Translation Models

Big Data Adaptation DatAptor Dr Khalil Sima'an

Statistical Machine Translation

PROMT Technologies for Translation and Big Data

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Statistical Machine Translation

Chapter 6. Decoding. Statistical Machine Translation

Machine translation techniques for presentation of summaries

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

Comprendium Translator System Overview

M LTO Multilingual On-Line Translation

Human in the Loop Machine Translation of Medical Terminology

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Analysis of Social Media Streams

An End-to-End Discriminative Approach to Machine Translation

Neural Machine Transla/on for Spoken Language Domains. Thang Luong IWSLT 2015 (Joint work with Chris Manning)

Machine Translation and the Translator

LINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN

Statistical Machine Translation for Automobile Marketing Texts

Building a Web-based parallel corpus and filtering out machinetranslated

Application of Machine Translation in Localization into Low-Resourced Languages

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY Training Phrase-Based Machine Translation Models on the Cloud

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

All the English here is taken from students work, both written and spoken. Can you spot the errors and correct them?

Machine Translation for Human Translators

Text-Driven Ontology Generation and Extension in the Finance Domain. Mihaela Vela Language Technology Lab DFKI Saarbrücken

AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1]

Extracting translation relations for humanreadable dictionaries from bilingual text

Translation Solution for

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

Chinese-Japanese Machine Translation Exploiting Chinese Characters

Implementing Data Models and Reports with Microsoft SQL Server

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

A Learning Based Method for Super-Resolution of Low Resolution Images

Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER Ncode: an Open Source Bilingual N-gram SMT Toolkit

Cross-lingual Sentence Compression for Subtitles

Domain-specific terminology extraction for Machine Translation. Mihael Arcan

Elena Chiocchetti & Natascia Ralli (EURAC) Tanja Wissik & Vesna Lušicky (University of Vienna)

The history of machine translation in a nutshell

REALIZATION SORTING ALGORITHM USING PARALLEL TECHNOLOGIES bachelor, Mikhelev Vladimir candidate of Science, prof., Sinyuk Vasily

Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora

UNSUPERVISED MORPHOLOGICAL SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION

A Mapping of CIDOC CRM Events to German Wordnet for Event Detection in Texts

Transcription:

HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010

OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION

SMT VS. RBMT [K. CHEN & H. CHEN, 1996] Rule-based machine translation (RBMT) Advantages 1. easy to build an initial system 2. based on linguistic theories 3. effective for core phenomena Disadvantages 1. rules are formulated by experts 2. difficult to maintain and extend 3. ineffective for marginal phenomena Statistical machine translation (SMT) Advantages 1. numerical knowledge 2. extracts knowledge from corpus 3. reduces the human cost 4. mathematically grounded model Disadvantages 1. no linguistic background 2. search cost is expensive 3. hard to capture long distance phenomena

HYBRID MACHINE TRANSLATIONS RBMT & SMT: complementary RBMT SMT Syntax, Morphology ++ - - Structural semantics ++ - - Lexical semantics - + Lexical adaptivity - - + Lexical reliability + - Integrated approaches for better results

VARIOUS WAYS TO COMBINE SMT AND RBMT [explored in EuroMatrix (Plus) since 2006]

HYBRID ARCHITECTURE CONSIDERED HERE

PHRASE-BASED SMT + RBMT [EISELE ET AL. 2008] Integration approach with PBSMT Extract translation correspondences from RBMT outputs Construct phrase tables using the extracted information Combine phrase tables from different sources Produce the final translations with a SMT decoder......re-purposed to combine snippets from different sources Problems Multiple RBMT systems required for better performance Only outperforms SMT baseline in out-of-domain tests Good structures from RBMT not exploited by SMT

HIERARCHICAL SMT + RBMT How to utilize the implicit linguistic knowledge contained in RBMT translations in a robust way?

HIERARCHICAL SMT + RBMT How to utilize the implicit linguistic knowledge contained in RBMT translations in a robust way? Our solution: Hierarchical phrases Preserve some useful structures from RBMT outputs No detailed linguistic analysis required

RBMT PHRASE TABLE Close to the standard SMT training RBMT translations Only on the development/test set Word alignment From the input text to the RBMT translation Large word alignment model from the SMT baseline system as the base model Phrase extraction Standard rule extraction procedure with suffix arrays Higher feature values than the normal SMT models

COMBINED PHRASE TABLE Phrase table extension with RBMT features source target SMT features RBMT features zum at the 1.98 1.89 2.43 1.95 1.82 2.12 der X 1, die the X 1 which 1.25 1.78 1.67 1.05 1.48 1.42 der X 1 der X 2 of the X 1 of the X 2 1.39 1.12 1.86 1.58 1.06 1.50 landesgrenzen boundaries 1.15 1.75 1.11 1.0 1.0 1.0 X 1 abgeschlossen sein X 1 be finalised 1.84 1.70 1.85 1.0 1.0 1.0 fakten X 1 der X 2 facts X 1 against the X 2 1.04 1.04 3.61 1.0 1.0 1.0 nach den after that 1.0 1.0 1.0 1.11 2.10 2.12 auf der X 1 on which X 1 1.0 1.0 1.0 1.36 1.42 2.13 die X 1 von X 2 who X 1 of X 2 1.0 1.0 1.0 1.38 1.27 1.92 Training Tuned on combined PT with development set Translate test set with combined PT

EXPERIMENT SETUP Data Training: Europarl-v4 Development: WMT 2007 Testing: WMT 2008 (EP) Out-of-domain: News (NC) Tools RBMT: Lucy Hierarchical SMT: Joshua Alignment Training set: Berkeley aligner Hypothesis alignment: GIZA++ MERT: ZMERT

BLEU SCORES de-en en-de EP NC EP NC Lucy 16.40 17.02 11.23 13.01 Moses 27.27 16.66 19.42 10.27 +Lucy 27.26 16.06 19.19 12.35 Joshua 27.51 16.24 20.69 10.48 +Lucy 27.52 17.69 20.89 13.21

EXAMPLES IN-DOMAIN Source Reference Lucy Moses +Lucy Joshua +Lucy Ich möchte Sie daran erinnern, dass sich unter unseren Verbündeten entschiedene Befürworter dieser Steuer befinden. Let me remind you that our allies include fervent supporters of this tax. I would like to remind you of there being decisive proponents of this tax among our allies. I would like to remind you that under our allies are strong supporters of this tax. I would like to remind you that there are among our allies in favour of this tax. I would like to remind you that, under our allies are strong supporters of this tax. I would like to remind you that there are strong supporters of this tax among our allies.

EXAMPLES OUT-OF-DOMAIN Source Reference Lucy Moses +Lucy Joshua +Lucy So kooperieren die Hochschulen schon aus Tradition mit den Nachbarländern. The university-level institutions cooperation with the neighboring countries, for instance, is part of a tradition. So the colleges co-operate already from tradition with the neighbor countries closely. So the universities from tradition cooperate closely with the neighbouring countries. So the colleges co-operate closely with the neighbouring already from tradition. So cooperate closely with the neighbouring the universities from tradition. So the universities, already from tradition, co-operate closely with the neighbouring countries.

ALIGNMENT FOR RBMT OUTPUTS Systems Test Set Europarl Moses+Lucy Europarl 19.37 19.19 Moses+Lucy News 12.50 12.38 Joshua+Lucy Europarl 20.83 20.89 Joshua+Lucy News 13.17 13.21

SUMMARY Integrating a RBMT system with hierarchical SMT system Features Hierarchical rule extraction from RBMT outputs Phrase table extension with all features Results Improvement over both sub-systems Superior to hybrid system based on PBSMT

FUTURE WORK Other variants Larger in-domain language models Better alignment and rule extraction Build more reliable alignment model for RBMT outputs Internal alignments from RBMT Deeper integration