Empirical Machine Translation and its Evaluation



Similar documents
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN

Introduction. Philipp Koehn. 28 January 2016

Hybrid Machine Translation Guided by a Rule Based System

Multi-Engine Machine Translation by Recursive Sentence Decomposition

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Computer Aided Translation

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Statistical Machine Translation

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

Hybrid Strategies. for better products and shorter time-to-market

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Big Data Adaptation DatAptor Dr Khalil Sima'an

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Collaborative Machine Translation Service for Scientific texts

Comprendium Translator System Overview

Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

BILINGUAL TRANSLATION SYSTEM

Machine translation techniques for presentation of summaries

Factored Translation Models

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Semantic analysis of text and speech

IRIS - English-Irish Translation System

Machine Translation at the European Commission

Machine Translation and the Translator

Customizing an English-Korean Machine Translation System for Patent Translation *

Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation

Overview of MT techniques. Malek Boualem (FT)

PROMT Technologies for Translation and Big Data

Language and Computation

Paraphrasing controlled English texts

Report on the embedding and evaluation of the second MT pilot

Speech understanding in dialogue systems

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

Learning Translation Rules from Bilingual English Filipino Corpus

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

Deciphering Foreign Language

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Machine Learning for natural language processing

Context Grammar and POS Tagging

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Overview of the TACITUS Project

Automatic slide assignation for language model adaptation

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Application of Natural Language Interface to a Machine Translation Problem

Natural Language to Relational Query by Using Parsing Compiler

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

a Chinese-to-Spanish rule-based machine translation

An Overview of Computational Advertising

Machine Translation. Agenda

LGPLLR : an open source license for NLP (Natural Language Processing) Sébastien Paumier. Université Paris-Est Marne-la-Vallée

The Challenge of Machine Translation of Patent Specifications and the Approach of the European Patent Office

How to make Ontologies self-building from Wiki-Texts

M LTO Multilingual On-Line Translation

Statistical Machine Translation

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Example-based Translation without Parallel Corpora: First experiments on a prototype

Processing: current projects and research at the IXA Group

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Structure of Clauses. March 9, 2004

Deep Learning For Text Processing

Named Entity Recognition Experiments on Turkish Texts

University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus

TAPTA: Translation Assistant for Patent Titles and Abstracts

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Outline of today s lecture

Text Generation for Abstractive Summarization

Transcription:

Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Jesús Giménez (Advisor, Lluís Màrquez) Universitat Politècnica de Catalunya May 28, 2010

Empirical Machine Translation

Empirical Machine Translation

Empirical Machine Translation

Empirical Machine Translation

Empirical Machine Translation

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation A brilliant play written by William Locke Una obra brillante escrita por William Locke vs. A brilliant play by Lionel Messi that produced a wonderful goal Una brillante jugada de Lionel Messi que resultó en un bello gol

Discriminative Phrase Translation Idea Use discriminative Machine Learning techniques in SMT to estimate P(e i f j ), actually, P(e i f j, context f j ) Translation modeling is addressed as a classification problem

Discriminative Phrase Translation Idea Use discriminative Machine Learning techniques in SMT to estimate P(e i f j ), actually, P(e i f j, context f j ) Translation modeling is addressed as a classification problem

Contributions 1 Discriminative Phrase Selection for SMT Shallow-syntactic features word, lemma, parts of speech, chunking local / global context Improved translation quality 2 Domain Dependence in SMT Parliament proceedings Dictionary definitions Adaptation based on: EuroWordNet out-of-domain data a small amount of in-domain data Improved translation quality

Contributions 1 Discriminative Phrase Selection for SMT Shallow-syntactic features word, lemma, parts of speech, chunking local / global context Improved translation quality 2 Domain Dependence in SMT Parliament proceedings Dictionary definitions Adaptation based on: EuroWordNet out-of-domain data a small amount of in-domain data Improved translation quality

Contributions 1 Discriminative Phrase Selection for SMT Shallow-syntactic features word, lemma, parts of speech, chunking local / global context Improved translation quality 2 Domain Dependence in SMT Parliament proceedings Dictionary definitions Adaptation based on: EuroWordNet out-of-domain data a small amount of in-domain data Improved translation quality

Contributions 1 Discriminative Phrase Selection for SMT Shallow-syntactic features word, lemma, parts of speech, chunking local / global context Improved translation quality 2 Domain Dependence in SMT Parliament proceedings Dictionary definitions Adaptation based on: EuroWordNet out-of-domain data a small amount of in-domain data Improved translation quality

... and its Evaluation

... and its Evaluation

... and its Evaluation

Limits of Lexical Similarity Candidate On Tuesday several missiles and mortar Translation shells fell in southern Israel, but there were no casualties. Reference Several Qassam rockets and mortar shells Translation fell today, Tuesday, in southern Israel without causing any casualties. Only one 4-gram in common!

Limits of Lexical Similarity Candidate On Tuesday several missiles and mortar Translation shells fell in southern Israel, but there were no casualties. Reference Several Qassam rockets and mortar shells Translation fell today, Tuesday, in southern Israel without causing any casualties. Only one 4-gram in common!

Limits of Lexical Similarity Candidate On Tuesday several missiles and mortar Translation shells fell in southern Israel, but there were no casualties. Reference Several Qassam rockets and mortar shells Translation fell today, Tuesday, in southern Israel without causing any casualties. Only one 4-gram in common!

Linguistic Features for Automatic MT Evaluation Idea Define similarity measures based on deeper linguistic information Compare linguistic structures and their lexical realizations Linguistic levels Syntax Parts-of-speech Base phrase chunks Phrase constituents Dependency relationships Semantics Named entities Semantic roles Discourse representations

Linguistic Features for Automatic MT Evaluation S PP TMP S. On NP NP A1 VP, but S Tuesday several missiles and mortar shells fell PP LOC NP VP in NP there were NP southern Israel no casualties

Linguistic Features for Automatic MT Evaluation S PP TMP S. On NP NP A1 VP, but S Tuesday several missiles and mortar shells fell PP LOC NP VP in NP there were NP southern Israel no casualties

Linguistic Features for Automatic MT Evaluation S NP A1 A0 VP. NP and NP fell NP PP LOC PP ADV Several Qassam rockets mortar shells NP TMP, NP, in NP without S today Tuesday southern VP Israel causing NP A1 any casualties

Linguistic Features for Automatic MT Evaluation S NP A1 A0 VP. NP and NP fell NP PP LOC PP ADV Several Qassam rockets mortar shells NP TMP, NP, in NP without S today Tuesday southern VP Israel causing NP A1 any casualties

Contributions 1 Linguistic measures provide more reliable rankings when the systems under evaluation are based on different paradigms (statistical vs. rule-based, fully-automatic vs. human-aided) 2 Linguistic measures have proven effective in shared tasks (WMT 2007-2009) both at the system and sentence levels 3 Some linguistic measures suffer a substantial quality decrease at the sentence level (due to parsing errors!) 4 Lexical and Linguistic measures are complementary suitable for being combined!

Contributions 1 Linguistic measures provide more reliable rankings when the systems under evaluation are based on different paradigms (statistical vs. rule-based, fully-automatic vs. human-aided) 2 Linguistic measures have proven effective in shared tasks (WMT 2007-2009) both at the system and sentence levels 3 Some linguistic measures suffer a substantial quality decrease at the sentence level (due to parsing errors!) 4 Lexical and Linguistic measures are complementary suitable for being combined!

Contributions 1 Linguistic measures provide more reliable rankings when the systems under evaluation are based on different paradigms (statistical vs. rule-based, fully-automatic vs. human-aided) 2 Linguistic measures have proven effective in shared tasks (WMT 2007-2009) both at the system and sentence levels 3 Some linguistic measures suffer a substantial quality decrease at the sentence level (due to parsing errors!) 4 Lexical and Linguistic measures are complementary suitable for being combined!

Contributions 1 Linguistic measures provide more reliable rankings when the systems under evaluation are based on different paradigms (statistical vs. rule-based, fully-automatic vs. human-aided) 2 Linguistic measures have proven effective in shared tasks (WMT 2007-2009) both at the system and sentence levels 3 Some linguistic measures suffer a substantial quality decrease at the sentence level (due to parsing errors!) 4 Lexical and Linguistic measures are complementary suitable for being combined!

Contributions 1 Linguistic measures provide more reliable rankings when the systems under evaluation are based on different paradigms (statistical vs. rule-based, fully-automatic vs. human-aided) 2 Linguistic measures have proven effective in shared tasks (WMT 2007-2009) both at the system and sentence levels 3 Some linguistic measures suffer a substantial quality decrease at the sentence level (due to parsing errors!) 4 Lexical and Linguistic measures are complementary suitable for being combined!

Acknowledgements 1 Spanish Government Ministry of Science and Technology Ministry of Education 2 Organizers and participants of the NIST, WMT and IWSLT Evaluation Campaigns 3 A number of NLP researchers worldwide sharing their software 4 Enrique Amigó and German Rigau

Acknowledgements 1 Spanish Government Ministry of Science and Technology Ministry of Education 2 Organizers and participants of the NIST, WMT and IWSLT Evaluation Campaigns 3 A number of NLP researchers worldwide sharing their software 4 Enrique Amigó and German Rigau

Acknowledgements 1 Spanish Government Ministry of Science and Technology Ministry of Education 2 Organizers and participants of the NIST, WMT and IWSLT Evaluation Campaigns 3 A number of NLP researchers worldwide sharing their software 4 Enrique Amigó and German Rigau

Acknowledgements 1 Spanish Government Ministry of Science and Technology Ministry of Education 2 Organizers and participants of the NIST, WMT and IWSLT Evaluation Campaigns 3 A number of NLP researchers worldwide sharing their software 4 Enrique Amigó and German Rigau

Empirical Machine Translation and its Evaluation EAMT Best Thesis Award 2008 Thanks!