Introduction. Philipp Koehn. 28 January 2016

Similar documents
Machine Translation and the Translator

Computer Aided Translation

Chapter 5. Phrase-based models. Statistical Machine Translation

Empirical Machine Translation and its Evaluation

Statistical Machine Translation

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Collecting Polish German Parallel Corpora in the Internet

Comprendium Translator System Overview

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

This page intentionally left blank

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Language and Computation

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Statistical Machine Translation Lecture 4. Beyond IBM Model 1 to Phrase-Based Models

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Semantic analysis of text and speech

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Factored Translation Models

Convergence of Translation Memory and Statistical Machine Translation

Compare characteristic features in traditional stories that meet their purpose and audience?

Statistical Machine Translation

A. Schedule: Reading, problem set #2, midterm. B. Problem set #1: Aim to have this for you by Thursday (but it could be Tuesday)

Syntax: Phrases. 1. The phrase

Hybrid Strategies. for better products and shorter time-to-market

Learning Translation Rules from Bilingual English Filipino Corpus

Beyond Grammar: Revisiting Translation in the Foreign Language Classroom Linda Louie, Department of French BLC Fellows Presentation, December 4, 2015

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

An Overview of Applied Linguistics

Machine Translation. Agenda

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

Keywords academic writing phraseology dissertations online support international students

Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The English Genitive Alternation

Outline of today s lecture

Question template for interviews

Big Data and Scripting

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

Syntactic Theory on Swedish

Paraphrasing controlled English texts

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Overview of the TACITUS Project

A Joint Sequence Translation Model with Integrated Reordering

Extracting translation relations for humanreadable dictionaries from bilingual text

Text Mining - Scope and Applications

10th Grade Language. Goal ISAT% Objective Description (with content limits) Vocabulary Words

Master of Arts in Linguistics Syllabus

Start ASL The Fun Way to Learn American Sign Language for free!

PROMT Technologies for Translation and Big Data

Customizing an English-Korean Machine Translation System for Patent Translation *

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Statistical Machine Translation

Year 1 reading expectations (New Curriculum) Year 1 writing expectations (New Curriculum)

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Online free translation services

IRIS - English-Irish Translation System

DEFINITION OF CLAUSE AND PHRASE:

Database Design For Corpus Storage: The ET10-63 Data Model

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

CENTRAL TEXAS COLLEGE: English 1301

The Book of Grammar Lesson Six. Mr. McBride AP Language and Composition

The University of Toronto. Fall 2009/German 100 Y

How To Write The English Language Learner Can Do Booklet

Artificial Intelligence Exam DT2001 / DT2006 Ordinarie tentamen

IP PATTERNS OF MOVEMENTS IN VSO TYPOLOGY: THE CASE OF ARABIC

Transportation: Week 2 of 2

How to become a successful language learner

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

2. PRINCIPLES IN USING CONJUNCTIONS. Conjunction is a word which is used to link or join words, phrases, or clauses.

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

12 FIRST QUARTER. Class Assignments

Effective Self-Training for Parsing

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Representation and Processing Revisited: Meaning

CHINESE SECOND LANGUAGE

M LTO Multilingual On-Line Translation

UNITED NATIONS Press Release Committee on the Rights of the Child 16 January 2009

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

Constituency. The basic units of sentence structure

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Hungarian teachers perceptions of dyslexic language learners

Annotation Guidelines for Dutch-English Word Alignment

Learning Translations of Named-Entity Phrases from Parallel Corpora

Transcription:

Introduction Philipp Koehn 28 January 2016

Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post) Grading five programming assignments (12% each) final project (30%) in-class presentation: language in ten minutes (10%)

Textbook 2

Machine Translation: Chinese 3

Machine Translation: French 4

A Clear Plan 5 Interlingua Lexical Transfer Source Target

A Clear Plan 6 Interlingua Analysis Syntactic Transfer Lexical Transfer Generation Source Target

A Clear Plan 7 Interlingua Semantic Transfer Generation Analysis Syntactic Transfer Lexical Transfer Source Target

A Clear Plan 8 Interlingua Analysis Semantic Transfer Syntactic Transfer Generation Lexical Transfer Source Target

Learning from Data 9 Training Data parallel corpora monolingual corpora dictionaries Training Statistical Machine Translation System Linguistic Tools Using Source Text Statistical Machine Translation System Translation

10 why is that a good plan?

Word Translation Problems 11 Words are ambiguous He deposited money in a bank account with a high interest rate. Sitting on the bank of the Mississippi, a passing ship piqued his interest. How do we find the right meaning, and thus translation? Context should be helpful

Syntactic Translation Problems 12 Languages have different sentence structure das behaupten sie wenigstens this claim they at least the she Convert from object-verb-subject (OVS) to subject-verb-object (SVO) Ambiguities can be resolved through syntactic analysis the meaning the of das not possible (not a noun phrase) the meaning she of sie not possible (subject-verb agreement)

Semantic Translation Problems 13 Pronominal anaphora I saw the movie and it is good. How to translate it into German (or French)? it refers to movie movie translates to Film Film has masculine gender ergo: it must be translated into masculine pronoun er We are not handling this very well [Le Nagard and Koehn, 2010]

Semantic Translation Problems 14 Coreference Whenever I visit my uncle and his daughters, I can t decide who is my favorite cousin. How to translate cousin into German? Male or female? Complex inference required

Semantic Translation Problems 15 Discourse Since you brought it up, I do not agree with you. Since you brought it up, we have been working on it. How to translated since? Temporal or conditional? Analysis of discourse structure a hard problem

Learning from Data 16 What is the best translation? Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334

Learning from Data 17 What is the best translation? Counts in European Parliament corpus Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334

Learning from Data 18 What is the best translation? Phrasal rules Sicherheit security 14,516 Sicherheit safety 10,015 Sicherheit certainty 334 Sicherheitspolitik security policy 1580 Sicherheitspolitik safety policy 13 Sicherheitspolitik certainty policy 0 Lebensmittelsicherheit food security 51 Lebensmittelsicherheit food safety 1084 Lebensmittelsicherheit food certainty 0 Rechtssicherheit legal security 156 Rechtssicherheit legal safety 5 Rechtssicherheit legal certainty 723

Learning from Data 19 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700

Learning from Data 20 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700 Hits on Google

Learning from Data 21 What is most fluent? a problem for translation 13,000 a problem of translation 61,600 a problem in translation 81,700 a translation problem 235,000

Learning from Data 22 What is most fluent? police disrupted the demonstration 2,140 police broke up the demonstration 66,600 police dispersed the demonstration 25,800 police ended the demonstration 762 police dissolved the demonstration 2,030 police stopped the demonstration 722,000 police suppressed the demonstration 1,400 police shut down the demonstration 2,040

Learning from Data 23 What is most fluent? police disrupted the demonstration 2,140 police broke up the demonstration 66,600 police dispersed the demonstration 25,800 police ended the demonstration 762 police dissolved the demonstration 2,030 police stopped the demonstration 722,000 police suppressed the demonstration 1,400 police shut down the demonstration 2,040

24 where are we now?

Word Alignment 25 michael geht davon aus, dass er im haus bleibt michael assumes that he will stay in the house

Phrase-Based Model 26 Foreign input is segmented in phrases Each phrase is translated into English Phrases are reordered Workhorse of today s statistical machine translation

Syntax-Based Translation 27 S PRO VP VP VP VBZ wants TO to VB NP NP NP PP PRO she DET a NN cup IN of NN NN coffee VB drink Sie PPER will VAFIN eine ART Tasse NN Kaffee NN trinken VVINF NP S VP

Semantic Translation 28 Abstract meaning representation [Knight et al., ongoing] (w / want-01 :agent (b / boy) :theme (l / love :agent (g / girl) :patient b)) Generalizes over equivalent syntactic constructs (e.g., active and passive) Defines semantic relationships semantic roles co-reference discourse relations In a very preliminary stage

29 what is it good for?

30 what is it good enough for?

Why Machine Translation? 31 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority of research (GALE program, etc.) Communication participants don t speak same language, rely on translation users can ask questions, when something is unclear chat room translations, hand-held devices often combined with speech recognition, IWSLT campaign Dissemination publisher wants to make content available in other languages high demands for quality currently almost exclusively done by human translators

Problem: No Single Right Answer 32 Israeli officials are responsible for airport security. Israel is in charge of the security at this airport. The security work for this airport is the responsibility of the Israel government. Israeli side was in charge of the security of this airport. Israel is responsible for the airport s security. Israel is responsible for safety work at this airport. Israel presides over the security of the airport. Israel took charge of the airport security. The safety of this airport is taken charge of by Israel. This airport s security is the responsibility of the Israeli security officials.

Quality 33 HTER assessment 0% 10% 20% publishable editable 30% gistable 40% triagable 50% (scale developed in preparation of DARPA GALE programme)

Applications 34 HTER assessment application examples 0% Seamless bridging of language divide publishable Automatic publication of official announcements 10% editable Increased productivity of human translators 20% Access to official publications Multi-lingual communication (chat, social networks) 30% gistable Information gathering Trend spotting 40% triagable Identifying relevant documents 50%

Current State of the Art 35 HTER assessment language pairs and domains 0% publishable French-English restricted domain 10% French-English technical document localization editable French-English news stories 20% English-German news stories 30% gistable English-Czech open domain 40% triagable 50% (informal rough estimates by presenter)

Thank You 36 questions?