Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier



Similar documents
Overview of MT techniques. Malek Boualem (FT)

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Comprendium Translator System Overview

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Customizing an English-Korean Machine Translation System for Patent Translation *

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Adjectives/adverbs When do you use careless and when do you use carelessly?

Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives

Grammar Presentation: The Sentence

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Context Grammar and POS Tagging

BILINGUAL TRANSLATION SYSTEM

Hybrid Strategies. for better products and shorter time-to-market

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

4.1 Multilingual versus bilingual systems

Statistical Machine Translation

Chapter 5. Phrase-based models. Statistical Machine Translation

Sweden National H.O.G. Rally July 2010

Determine two or more main ideas of a text and use details from the text to support the answer

Language and Computation

ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION

Vocabulary in A1 level second language writing

Grammar Rules: Parts of Speech Words are classed into eight categories according to their uses in a sentence.

12 FIRST QUARTER. Class Assignments

Automated Online English -Arabic Translator

Integrating Query Translation and Document Translation in a Cross-Language Information Retrieval System

Livingston Public Schools Scope and Sequence K 6 Grammar and Mechanics

Annotation Guidelines for Dutch-English Word Alignment

Natural Language to Relational Query by Using Parsing Compiler

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

A chart generator for the Dutch Alpino grammar

Hybrid Machine Translation Guided by a Rule Based System

TRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

Learning the Question & Answer Flows

Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University

SEPTEMBER Unit 1 Page Learning Goals 1 Short a 2 b 3-5 blends 6-7 c as in cat 8-11 t p

Semantic analysis of text and speech

Differences in linguistic and discourse features of narrative writing performance. Dr. Bilal Genç 1 Dr. Kağan Büyükkarcı 2 Ali Göksu 3

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Learning the Question & Answer Flows

How To Translate English To Yoruba Language To Yoranuva

Getting Off to a Good Start: Best Practices for Terminology

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Proofreading and Editing:

A Mixed Trigrams Approach for Context Sensitive Spell Checking

Application of Natural Language Interface to a Machine Translation Problem

Development allowance and activity grant [Aktivitetsstöd och utvecklingsersättning]

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

International Council on Systems Engineering. ISO/IEC/IEEE SEminar Linköping 16 november 2015

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

TRANSLATING POLISH TEXTS INTO SIGN LANGUAGE IN THE TGT SYSTEM

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Syntactic Theory on Swedish

Parsing Swedish. Atro Voutilainen Conexor oy CG and FDG

POS Tagsets and POS Tagging. Definition. Tokenization. Tagset Design. Automatic POS Tagging Bigram tagging. Maximum Likelihood Estimation 1 / 23

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Defining Distributed Systems. Distribuerade system. Exemples of DS: Intranet. Exemples of DS: Internet. Exemples of DS: Mobile Computing

Phrase-Based MT. Machine Translation Lecture 7. Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu. Website: mt-class.

Word Completion and Prediction in Hebrew

a Chinese-to-Spanish rule-based machine translation

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

PoS-tagging Italian texts with CORISTagger

Chapter 8. Final Results on Dutch Senseval-2 Test Data

PRTK. Password Recovery ToolKit EFS (Encrypting File System)

Level 1 Teacher s Manual

Scrum Kandidatprojekt datateknik - TDDD83

An Approach to Handle Idioms and Phrasal Verbs in English-Tamil Machine Translation System

English to Creole and Creole to English Rule Based Machine Translation System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

A Machine Translation System Between a Pair of Closely Related Languages

EAS Basic Outline. Overview

Compound Sentences and Coordination

WINDOWS PRESENTATION FOUNDATION LEKTION 3

L130: Chapter 5d. Dr. Shannon Bischoff. Dr. Shannon Bischoff () L130: Chapter 5d 1 / 25

Learning Translation Rules from Bilingual English Filipino Corpus

COMPUTATIONAL DATA ANALYSIS FOR SYNTAX

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*

Semantic annotation of requirements for automatic UML class diagram generation

Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing

degrees Fahrenheit. Scientists believe it's human activity that's driving the temperatures up, a process

What s in a Lexicon. The Lexicon. Lexicon vs. Dictionary. What kind of Information should a Lexicon contain?

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Lecture 5. Verbs and Verb Phrases I

Extracting translation relations for humanreadable dictionaries from bilingual text

COURSE OBJECTIVES SPAN 100/101 ELEMENTARY SPANISH LISTENING. SPEAKING/FUNCTIONAl KNOWLEDGE

Introduction. Philipp Koehn. 28 January 2016

The parts of speech: the basic labels

Correlation: ELLIS. English language Learning and Instruction System. and the TOEFL. Test Of English as a Foreign Language

MODERN WRITTEN ARABIC. Volume I. Hosted for free on livelingua.com

According to the Argentine writer Jorge Luis Borges, in the Celestial Emporium of Benevolent Knowledge, animals are divided

Björn Lundquist UiT The Arctic University of Norway

Transcription:

Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier

Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every subst->pus

Flertydighet i källspråket, forts anta [någon] (till utbildning) -> admit [att] -> suppose kunna vara i stånd -> be able to ha kunskap om -> know

Variation i målspråket Vid avslutad kurs On completion of the course 173.000 After completion of the course 74.000 Having completed the course, 25.900 After finishing the course 25.400 After completed course 636 After a completed course 192 En ordagrann direktöversättning passar inte in i något fall (*At completed course).

Lexikala översättningsval på ->on/of/in/ baserad på -> based on exempel på -> example of studenter på programmet -> students in the program redogöra för account for describe

Grammatiska skillnader Efter avslutad kurs förväntas studenten ha grundläggande kunskaper om dynamiken i atmosfären. On completion of the course, the student is expected to have basic knowledge of the dynamics of the atmosphere.

Basic strategies direct translation rule-based translation transfer interlingua example-based translation statistical translation hybrids

The Vauquois triangle http://www1.cs.columbia.edu/~julia/jmchap ters/ch24.pdf

Direct translation no complete intermediary sentence structure translation proceeds in a number of steps, each step dedicated to a specific task the most important component is the bilingual dictionary typically general language problems with ambiguity inflection word order and other structural shifts

Simplistic approach sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution heuristics for handling ambiguities copying unknown words, digits, signs of punctuation etc. formal editing

Advanced classical approach (Tucker 1987) source text dictionary look-up and morphological analysis identification of homographs identification of compound nouns identification of nouns and verb phrases processing of idioms

Advanced approach, cont. processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of target text rearrangement of words and phrases in target text

Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a complete sentence structure?

Systran System Translation developped in the US by Peter Toma first version 1969 (Ru-En) EC bought the rights of Systran in 1976 currently 18 language pairs first sv-en version in 2003 http://babelfish.altavista.com/

Systran, cont. more than 1,600,000 dictionary units 20 domain dictionaries daily use by EC translators, administrators of the European institutions originally a direct translation strategy see H&S to-day more of a transfer-based strategy

Ex. 1: fairly good translation /Systran sv-en "Enskilda företagare som inte bildat bolag klassificeras hit." "Individual entrepreneurs that have not formed companies are classified here. The system has identified bildat as a perfect tense form and translates it correctly have formed with the negation not in the right place.

Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de inte ens utsatts för influensa." "When the villages were contacted had they not even been exposed to flu. The system has not identified the subject and the predicate and thus generates wrong word order.

Ex. 3: ambiguity problem/ Systran sv-en "Vad kan vi lära av Arrawetestammen?" "What can we faith of the Arawete? The system does not find the connection between kan and lära and thus fails to recognize lära as a verb.

Ex. 4: ambiguity problem/ Systran sv-en Extrapoleringen går till så här. " The extrapolation goes to so here. The system does not recognize the phrasal verb gå till and thus translates incorrectly word by word.

Systran Linguistic Resources Dictionaries POS Definitions Inflection Tables Decomposition Tables Segmentation Dictionaries Disambiguation Rules Analysis Rules

Systran Processing Steps Analysis Lookup Compound decomposition Disambiguation Syntactic analysis Compound expansion Sentence transfer Initial target structure Lookup Default transfer of attributes Structure transformation

Systran Processing Steps (cont) Sentence synthesis Structure transformation Inflection look-up Surface transformation

Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91

Example 1 Sv. Fyll på olja i växellådan. En. Fill gearbox with oil. (from the Scania corpus) fyll på fill obj adv adv obj