462 Machine Translation Systems for Europe



Similar documents
A technical guide to 2014 key stage 2 to key stage 4 value added measures

Unit 11 Using Linear Regression to Describe Relationships

A Spam Message Filtering Method: focus on run time

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

A Note on Profit Maximization and Monotonicity for Inbound Call Centers

CASE STUDY BRIDGE.

TRADING rules are widely used in financial market as

Support Vector Machine Based Electricity Price Forecasting For Electricity Markets utilising Projected Assessment of System Adequacy Data.

Laureate Network Products & Services Copyright 2013 Laureate Education, Inc.

A note on profit maximization and monotonicity for inbound call centers

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

Project Management Basics

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Brand Equity Net Promoter Scores Versus Mean Scores. Which Presents a Clearer Picture For Action? A Non-Elite Branded University Example.

CASE STUDY ALLOCATE SOFTWARE


SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

Return on Investment and Effort Expenditure in the Software Development Environment

Assessing the Discriminatory Power of Credit Scores

A Review On Software Testing In SDlC And Testing Tools

Progress 8 measure in 2016, 2017, and Guide for maintained secondary schools, academies and free schools

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Bi-Objective Optimization for the Clinical Trial Supply Chain Management

Availability of WDM Multi Ring Networks

Two Dimensional FEM Simulation of Ultrasonic Wave Propagation in Isotropic Solid Media using COMSOL

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

HUMAN CAPITAL AND THE FUTURE OF TRANSITION ECONOMIES * Michael Spagat Royal Holloway, University of London, CEPR and Davidson Institute.

Development Progress

RISK MANAGEMENT POLICY

Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

Redesigning Ratings: Assessing the Discriminatory Power of Credit Scores under Censoring

Brokerage Commissions and Institutional Trading Patterns

Report b Measurement report. Sylomer - field test

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

How Enterprises Can Build Integrated Digital Marketing Experiences Using Drupal

Unobserved Heterogeneity and Risk in Wage Variance: Does Schooling Provide Earnings Insurance?

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

Cluster-Aware Cache for Network Attached Storage *

1) Assume that the sample is an SRS. The problem state that the subjects were randomly selected.

G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences

Exposure Metering Relating Subject Lighting to Film Exposure

Control Theory based Approach for the Improvement of Integrated Business Process Interoperability

Chapter 10 Stocks and Their Valuation ANSWERS TO END-OF-CHAPTER QUESTIONS

AN OVERVIEW ON CLUSTERING METHODS

MSc Financial Economics: International Finance. Bubbles in the Foreign Exchange Market. Anne Sibert. Revised Spring Contents

Growth and Sustainability of Managed Security Services Networks: An Economic Perspective

Utility-Based Flow Control for Sequential Imagery over Wireless Networks

Evaluating Teaching in Higher Education. September Bruce A. Weinberg The Ohio State University *, IZA, and NBER

Scheduling of Jobs and Maintenance Activities on Parallel Machines

A New Optimum Jitter Protection for Conversational VoIP

Morningstar Fixed Income Style Box TM Methodology

Graph Analyi I Network Meaure of the Networked Adaptive Agents

Senior Thesis. Horse Play. Optimal Wagers and the Kelly Criterion. Author: Courtney Kempton. Supervisor: Professor Jim Morrow

Sector Concentration in Loan Portfolios and Economic Capital. Abstract

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

Software Engineering Management: strategic choices in a new decade

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Analysis of Mesostructure Unit Cells Comprised of Octet-truss Structures

Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn

Review of Multiple Regression Richard Williams, University of Notre Dame, Last revised January 13, 2015

OPINION PIECE. It s up to the customer to ensure security of the Cloud

CHAPTER 5 BROADBAND CLASS-E AMPLIFIER

Research Article An (s, S) Production Inventory Controlled Self-Service Queuing System

Ohm s Law. Ohmic relationship V=IR. Electric Power. Non Ohmic devises. Schematic representation. Electric Power

Auction-Based Resource Allocation for Sharing Cloudlets in Mobile Cloud Computing

A model for the relationship between tropical precipitation and column water vapor

Linking Example-Based and Rule-Based Machine Translation. Michael Carl, Catherine Pease and Oliver Streiter

UNDERSTANDING SCHOOL LEADERSHIP AND MANAGEMENT IN CONTEMPORARY NIGERIA

Introduction to the article Degrees of Freedom.

Benchmarking Bottom-Up and Top-Down Strategies for SPARQL-to-SQL Query Translation

Transcription:

462 Machine Tranlation Sytem for Europe Philipp Koehn School of Informatic Univerity of Edinburgh pkoehn@inf.ed.ac.uk Alexandra Birch School of Informatic Univerity of Edinburgh a.birch@m.ed.ac.uk Ralf Steinberger Joint Reearch Centre European Commiion Ralf.Steinberger@jrc.it Abtract We built 462 machine tranlation ytem for all language pair of the Acqui Communautaire corpu. We report and analye the performance of thee ytem, and compare them againt pivot tranlation and a number of ytem combination method (multi-pivot, multiource) that are poible due to the available ytem. 1 Introduction While it many language poe a challenge to the economic and cultural integration of Europe, it alo provide an excellent tet bed for machine tranlation reearch. The official European language come from a variety of language familie and vary along many linguitic dimenion (morphology, word order, etc.). Some are cloely related (uch a Portuguee and Spanih), while ome are very ditant (uch a Finnih and German). The data come from even language familie, two of which are not Indo- European a hown in Table 1. In thi paper, we will decribe how the JRC- Acqui corpu wa ued to build machine tranlation ytem for 462 language pair. Thi allow u to analye the challenge of the different language pair by carrying out a regreion tudy to determine the main factor for difference in performance. We alo compare the direct tranlation ytem againt pivot tranlation through Englih and French. Surpriingly, tranlation performance i often better when pivoting through Englih, while it decreae for any other pivot language. The availability of tranlation ytem for o many language pair alo allow u to employ a ytem combination method to combine ytem in a novel way. We report on multi-pivot and multiource tranlation, which lead to gain of in the area of 0.5-1 %BLEU and 2-5 %BLEU, repectively. Indo-European Germanic Slavic Swedih v Polih pl German de Slovak k Dutch nl Czech c Danih da Slovene l Englih en Bulgarian bg Romance Baltic French fr Lithuanian lt Portuguee pt Latvian lv Italian it Greek Spanih e Greek el Romanian ro Non Indo-European Finno-Ugric Semitic Finnih fi Maltee mt Etonian et Hungarian hu Table 1: Acqui language in their language familie 2 Acquiition of the Corpu The corpu ued to develop the 462 MT ytem i the JRC-Acqui (Steinberger et al., 2006), a multilingual parallel corpu coniting of altogether over 1 billion word (almot 50 million word per language; ee Table 2). To our knowledge, it i the larget parallel corpu in o many language. Apart from it ize, the mot pecial and ueful feature of the JRC-Acqui i the fact that include a number of under-reourced language and language pair. The JRC-Acqui i to a large extent baed on the Acqui Communautaire, which i the body of common right and obligation which have been adopted by all European Union (EU) Member State. For the text to become legally binding in the EU Member State, they had to be tranlated into the 23 official EU language. The Irih verion (the 23rd official EU language), however, i not yet available. A text type, the corpu contain document on political objective, treatie, declaration, reolution, agreement, EU legilation, and more. It i thu motly of a legal nature, but a the law and the agreement cover mot domain of life, it doe contain vocabulary from a wide range of ubject field,

including human and veterinary medicine, the environment, fihery and agriculture, banking and commerce, tranport, energy, cience, ocial and religiou iue, geography and more. The corpu wa compiled by crawling document from the EU Eur-Lex webite 1 and by then electing thoe document that exited in at leat ten language, of which at leat three had to be language from the tate that joined the EU in 2004 Each JRC-Acqui document ha been manually claified according to the multilingual EUROVOC theauru 2, which ditinguihe over 6,000 ubject domain clae. 3 Data Preparation Training a tatitical machine tranlation ytem require a entence-aligned parallel corpu to build the model, a well a tuning and tet et to optimize and ae it performance. 3.1 Training Data The JRC-Acqui corpu provide already the data in the form required for training a tatitical machine 1 http://eur-lex.europa.eu/ 2 http://europa.eu/eurovoc/ Table 2: Size of the JRC-Acqui Communautaire corpu tranlation ytem, and very little additional proceing i needed. It i hard to quantify how much training data i needed to achieve a minimum level of performance. Thi depend on the expanivene of the domain and the language pair. Typically, ten of million of word give decent performance: For intance, ytem trained on the 30 40 million word Europarl corpu are competitive with commercial ytem, typically better on thi domain and even cloe in performance when tranlating related material uch a new torie (Callion-Burch et al., 2008). The JRC-Acqui corpu i large enough to expect decent tranlation performance within it domain, but on the other hand, the domain i alo very pecific. Tranlation model trained on uch legal text do not necearily perform well on other domain. 3.2 Tuning and Tet Set Since we develop machine tranlation ytem for 462 language pair, we wanted to have a common tuning and teting environment. Hence, we extracted from part of the corpu ubet where entence are aligned one-to-one acro all language. Firt, we identified all document that exit for all language. Thi i a et of 5383 document. From

Target Language en bg de c da el e et fi fr hu it lt lv mt nl pl pt ro k l v en 40.5 46.8 52.6 50.0 41.0 55.2 34.8 38.6 50.1 37.2 50.4 39.6 43.4 39.8 52.3 49.2 55.0 49.0 44.7 50.7 52.0 bg 61.3 38.7 39.4 39.6 34.5 46.9 25.5 26.7 42.4 22.0 43.5 29.3 29.1 25.9 44.9 35.1 45.9 36.8 34.1 34.1 39.9 de 53.6 26.3 35.4 43.1 32.8 47.1 26.7 29.5 39.4 27.6 42.7 27.6 30.3 19.8 50.2 30.2 44.1 30.7 29.4 31.4 41.2 c 58.4 32.0 42.6 43.6 34.6 48.9 30.7 30.5 41.6 27.4 44.3 34.5 35.8 26.3 46.5 39.2 45.7 36.5 43.6 41.3 42.9 da 57.6 28.7 44.1 35.7 34.3 47.5 27.8 31.6 41.3 24.2 43.8 29.7 32.9 21.1 48.5 34.3 45.4 33.9 33.0 36.2 47.2 el 59.5 32.4 43.1 37.7 44.5 54.0 26.5 29.0 48.3 23.7 49.6 29.0 32.6 23.8 48.9 34.2 52.5 37.2 33.1 36.3 43.3 e 60.0 31.1 42.7 37.5 44.4 39.4 25.4 28.5 51.3 24.0 51.7 26.8 30.5 24.6 48.8 33.9 57.3 38.1 31.7 33.9 43.7 et 52.0 24.6 37.3 35.2 37.8 28.2 40.4 37.7 33.4 30.9 37.0 35.0 36.9 20.5 41.3 32.0 37.8 28.0 30.6 32.9 37.3 fi 49.3 23.2 36.0 32.0 37.9 27.2 39.7 34.9 29.5 27.2 36.6 30.5 32.5 19.4 40.6 28.8 37.5 26.5 27.3 28.2 37.6 fr 64.0 34.5 45.1 39.5 47.4 42.8 60.9 26.7 30.0 25.5 56.1 28.3 31.9 25.3 51.6 35.7 61.0 43.8 33.1 35.6 45.8 hu 48.0 24.7 34.3 30.0 33.0 25.5 34.1 29.6 29.4 30.7 33.5 29.6 31.9 18.1 36.1 29.8 34.2 25.7 25.6 28.2 30.5 it 61.0 32.1 44.3 38.9 45.8 40.6 57.2 25.0 29.7 52.7 24.2 29.4 32.6 24.6 50.5 35.2 56.5 39.3 32.5 34.7 44.3 lt 51.8 27.6 33.9 37.0 36.8 26.5 41.0 34.2 32.0 34.4 28.5 36.8 40.1 22.2 38.1 31.6 31.6 29.3 31.8 35.3 35.3 lv 54.0 29.1 35.0 37.8 38.5 29.7 42.7 34.2 32.4 35.6 29.3 38.9 38.4 23.3 41.5 34.4 39.6 31.0 33.3 37.1 38.0 mt 72.1 32.2 37.2 37.9 38.9 33.7 48.7 26.9 25.8 42.4 22.4 43.7 30.2 33.2 44.0 37.1 45.9 38.9 35.8 40.0 41.6 nl 56.9 29.3 46.9 37.0 45.4 35.3 49.7 27.5 29.8 43.4 25.3 44.5 28.6 31.7 22.0 32.0 47.7 33.0 30.1 34.6 43.6 pl 60.8 31.5 40.2 44.2 42.1 34.2 46.2 29.2 29.0 40.0 24.5 43.2 33.2 35.6 27.9 44.8 44.1 38.2 38.2 39.8 42.1 pt 60.7 31.4 42.9 38.4 42.8 40.2 60.7 26.4 29.2 53.2 23.8 52.8 28.0 31.5 24.8 49.3 34.5 39.4 32.1 34.4 43.9 ro 60.8 33.1 38.5 37.8 40.3 35.6 50.4 24.6 26.2 46.5 25.0 44.8 28.4 29.9 28.7 43.0 35.8 48.5 31.5 35.1 39.4 k 60.8 32.6 39.4 48.1 41.0 33.3 46.2 29.8 28.4 39.4 27.4 41.8 33.8 36.7 28.5 44.4 39.0 43.3 35.3 42.6 41.8 l 61.0 33.1 37.9 43.5 42.6 34.0 47.0 31.1 28.8 38.2 25.7 42.3 34.6 37.3 30.0 45.9 38.2 44.1 35.8 38.9 42.7 v 58.5 26.9 41.0 35.6 46.6 33.3 46.6 27.4 30.9 38.9 22.7 42.0 28.2 31.0 23.7 45.6 32.2 44.2 32.7 31.3 33.5 Table 3: Tranlation performance a meaured in %BLEU for all 462 language pair thee, we elected a ubet of 270 document to extract tuning and tet et. We rely on the word alignment provided along with the JRC-Acqui corpu to match up the entence. There are everal trategie to match up entence acro all language in a multi-lingual corpu: (1) We extract thoe entence that are aligned 1-1 acro all language. (2) We allow many-to-many alignment between entence and extract minimal et of entence for each language that are aligned between each other but not other entence. (3) We chooe one language a pivot language and find matche in all the other language baed on the alignment to the pivot language. While we would have preferred one of the firt two method, they were not practical. Extracting only 1-1 entence alignment yielded motly only very hort entence, and extracting et of entence under tranitive cloure of the entence alignment very often matched up entire document. But either too hort or too long entence do not erve well a tuning and tet et. So, only the lat option wa practical, and we elected Englih a pivot language. Thi gave u a et of 12,322 entence aligned acro all 22 language of the corpu. We plit thi et into three part, a tuning et for parameter optimization, a development tet et for experimentation and a final tet et to report tranlation performance. Since thee et contain many hort and a few very long entence, we reduced the tuning et further, by requiring that all entence are between 8 and 60 word long. Thi left u with a tuning et of 1944 entence per language. 3.3 Training For the development of the tranlation ytem, we ued the default of the Moe toolkit (Koehn et al., 2007) with the following additional etting: maximum entence length 80 word, bi-directional md reordering model, 5-gram language model. 4 Performance A thorough evaluation of the tranlation quality of tranlation ytem for 462 different language pair would be a daunting tak, o we rely on automatic metric. The mot commonly ued metric in tatitical machine tranlation i the BLEU core (Papineni et al., 2002). Table 3 how the core for all the 462 tranlation ytem. Performance varie widely for the different language pair. For intance, French Englih tranlation (64.0) i better than Bulgarian Hungarian (24.7).

French Input French Englih MT Sytem Englih Reference Tranlation LE CONSEIL DE LA COMMUNAUTÉ The Council of the European Economic THE COUNCIL OF THE EUROPEAN ÉCONOMIQUE EUROPÉENNE, Community, ECONOMIC COMMUNITY, conidérant que l intauration d une politique commune de tranport comporte entre autre l établiement de règle commune applicable aux tranport internationaux de marchandie par route, exécuté au départ ou à detination du territoire d un état membre, ou traverant le territoire d un ou pluieur état membre; Wherea the etablihment of a common tranport policy entail, inter alia, laying down common rule applicable to the international carriage of good by road, to or from the territory of a Member State or paing acro the territory of one or more Member State; Wherea the adoption of a common tranport policy involve inter alia laying down common rule for the international carriage of good by road to or from the territory of a Member State or paing acro the territory of one or more Member State; Le tranport faiant l objet de l annexe II ne devront plu être oumi à un régime de contingentement. Il pourront cependant demeurer ujet à autoriation pour autant qu aucune retriction quantitative n en réulte ; chaque état membre devra en pareil ca veiller à ce qu une déciion intervienne dan le cinq jour uivant l introduction de la demande d autoriation. The carriage lited in Annex II hall not be ubject to a quota ytem. They may, however, remain ubject to authoriation provided that any quantitative retriction arie; each Member State may in uch cae enure that a deciion i taken within five day of ubmiion of the application for authoriation. The type of carriage lited in Annex II hall no longer be ubject to a quota ytem. They may, however, remain ubject to authoriation provided no quantitative retriction i involved ; in uch cae Member State hall enure that deciion on application for authoriation are given within five day of receipt. Figure 1: Sytem output for French Englih on the beginning of the tet et ued in the evaluation. Compared to BLEU core for other training cenario and tet et, thee number are fairly high, indicating that the ytem work very well on the domain of European law. European law i a very well-defined domain that doe not allow a lot of variation in tranlation, o it i poible for a tatitical ytem to pick up on the correct word and phrae to ue. See alo Figure 1 for ample output of the French Englih ytem. To get a better ene of the tranlation performance, we wanted to compare the tranlation ytem againt a tranlation ytem trained on the Europarl corpu. On the new et of the 2008 ACL Workhop on SMT, the Acqui ytem achieved a core of 11.6, while the Europarl ytem cored 15.7, for German Englih. factor, tranlation model entropy, which capture the amount of uncertainty preent when chooing candidate tranlation phrae. We have alo included corpu ize a a factor a the amount of Acqui data per language pair can vary by a factor of four. The following characteritic form part of our analyi: Morphological Complexity The morphological complexity of the language pair i an important factor influencing tranlation performance. A imple method of meauring thi complexity i to ue vocabulary ize. Vocabulary ize i trongly influenced by the number of word form for number, cae, tene etc. and it i alo affected by the number of agglutination in the language. 5 Analyi The Acqui corpu comprie of a very large number and variety of language pair. The breadth of data condition make thi corpu ideal for performing experiment which invetigate language pair characteritic and the effect they have on tranlation. Thi allow u to provide a wide perpective on the challenge facing machine tranlation and provide trong motivation for further reearch on important factor. 5.1 Factor In thi paper we extend and enhance previou reearch (Birch et al., 2008) by uing a much larger number of language pair and by invetigating a new Reordering We meaure word order difference between language by auming that reordering i a binary proce between two block that are adjacent in the ource and whoe order i revered in the target. Word alignment are extracted uing GIZA++ and then merged uing the grow-finaldiag algorithm. Reordering are then extracted uing the hift-reduce algorithm (Galley and Manning, 2008). Thee reordering are ued to extract a entence level metric, RQuantity (Birch et al., 2008), which i the um of the width of all the reordering on the ource ide, normalized by the length of the ource entence. Thi meaure i averaged over a random ample of 2000 training entence to get the corpu RQuantity.

Language Relatedne Language which are cloely related could hare morphological form which might be captured reaonably well in tranlation model. We include a meaure of language relatedne to take thi into account. Lexicotatitic provide a quantitative meaure of language relatedne by comparing lit of lexical cognate. We ue the data from Dyen et al. (1992) who developed a lit of 200 meaning for 84 Indo-European language. Non-Indo-European language are aigned a minimal core. Corpu Size The ize of the parallel corpora varie coniderably and we take thi into account by uing the number of entence pair ued for training the ytem a a factor. Thee factor, together with tranlation model entropy, which i decribed in the following ection, form the bai of our analyi of the Acqui corpu. 5.2 Tranlation Model Entropy Tranlation model entropy capture the amount of uncertainty involved in chooing candidate tranlation phrae. Some language pair can caue tranlation model to have higher entropy becaue there i no clear correlation between concept in one language and the other. Tranlating from morphologically poor language into richer language could alo lead to high entropy model, due to the lack of certainty a to which word form to chooe. To the bet of our knowledge, thi important characteritic of tranlation ha not been invetigated until now. The entropy of the tranlation model i calculated on the tet et. We perform a earch through all poible egmentation of the ource entence. Each egmentation, or ource phrae, ha a et of poible tranlation in the phrae table T. The entropy H for a ource phrae i calculated a follow: H() = t T p(t ) log 2 p(t ) The earch return the et of egment which cover the ource entence with the lowet average entropy per word. Longer phrae tend to have lower entropy with fewer phrae table entrie and more of the probability ma concentrated on fewer alternative, and they will tend to be elected when preent Source Language mt hu et fi lv lt el ro e it pt fr en da nl de v bg l c k pl pl k c l bg v de nl da en fr pt it e ro el lt lv fi et hu mt = 0.27 = 0.53 = 0.8 = 1.07 = 1.33 Target Language Figure 2: Matrix of Tranlation Model Entropie in the phrae table. Thi i imilar to the actual tranlation proce. Figure 2 how the average entence entropy for the Acqui matrix. The matrix ha a wide variety of entropy value for different language pair from the lowet, fr-en with 0.22, to the highet, et-pt with 1.33. It eem that model of language pair with a Romance Language or Englih a the ource generally have low entropy. The target language doe not eem to affect entropy a much, except in the cae of Englih where model entropy i particularly low. Thi confirm our intuition that tranlating from morphologically rich language into poorer one hould lead to lower entropy a Englih i the language with the lowet morphological complexity and mallet vocabulary ize. The model with the highet entropy eem to be thoe with very rich morphology in the ource, which doe not uphold our intuition that the poor-rich tranlation model would have a high entropy. In order to better undertand the entropy reult we fit a number of imple linear regreion model, with entropy a the independent variable. The reult are hown in Table 4 where we preent the R 2, which i the fraction of the variance explained by the model, or it goodne of fit and the ignificance of

Factor R 2 Significance Reordering Amnt 0.310 *** Source Vocab Size 0.285 *** Lang. Relatedne 0.123 *** Target Vocab Size 0.056 *** Source Corpu Size 0.003 Table 4: Simple linear regreion model howing correlation of entropy with other factor. the correlation, where * mean p < 0.05, ** mean p < 0.01, and *** mean p < 0.001. We can ee that reordering amount i the mot correlated factor. Thi i almot certainly not a caual relationhip and it doe not explain the entropy reult. However, the fact that ource vocabulary i more trongly correlated with entropy than target vocabulary ize could explain the fact that entropy eem to depend more on the ource language than on the target language. Finally we can ee that entropy i not at all correlated with corpu ize. Phrae table entropy cannot be defined imply in term of other meaure. It capture a new apect of tranlation difficulty which i very important, a we hall ee in the next ection. 5.3 Individual Impact on Performance In order to etablih the relative impact of the different factor on tranlation performance, we fit a number of imple linear regreion model. The reult are hown in Table 5. Tranlation model entropy i the factor which bet explain the variation in performance een between language pair. The amount of reordering account for a imilar amount of variation a entropy, while language relatedne and target vocabulary ize account for le than half of the variation. Thee finding upport the reult preented by Birch et al. (2008), howing that with a great number and variety of language pair, reordering ha an important effect on performance. 5.4 Combined Impact on Performance Although imple regreion can how the impact of the different factor in iolation, we are alo intereted in how they interact. We fit a multiple regreion model to the data where all explanatory variable vector were normalized to be more comparable. In Table 6 we can ee the relative contribution of the different factor to the model, although the factor are correlated. Thi mean that the magnitude of the coefficient are unreliable a the explanatory power of one variable could be hifted to another Factor R 2 Significance Entropy 0.276 *** Reordering Amnt 0.267 *** Lang. Relatedne 0.115 *** Target Vocab Size 0.101 *** Source Corpu Size 0.034 *** Target Corpu Size 0.034 *** Source Vocab Size 0.001 Table 5: Simple linear regreion model howing correlation of BLEU with explanatory factor. An R 2 of 0.276 implie that entropy explain 27.6% of the difference in performance. Explanatory Variable Coefficient Entropy -5.147 *** Corpu Size 24.412 *** Target Vocab. Size -21.759 *** Language Similarity 3.736 *** Reordering Amount -11.215 *** Target Vocab. Size 2 6.885 *** Interaction: Corp.Size/L.Sim. 4.377 *** Interaction: Corp.Size/Reord. -5.456 *** Interaction: Corp.Size/Entropy 2.449 * Interaction: T.Vocab.Size/L.Sim. -4.325 *** Interaction: T.Vocab.Size/Reord. 3.453 *** Table 6: The impact of the variou explanatory feature on the BLEU core via their coefficient in the minimal adequate model. correlated variable. The R 2 of the model i 0.745 which mean that 74.5% of the variation in BLEU can be explained by thee factor. 6 Sytem Combination Let u now look at ome type of ytem combination that we are able to explore uing our matrix of tranlation ytem. They are illutrated in Figure 3: pivot tranlation, multi-pivot tranlation, and multiource tranlation. 6.1 Pivot Tranlation Intead of building machine tranlation ytem for each language pair, we may want to reort to a impler trategy. We pick one language a the pivot, and only build ytem tranlating into and out of thi language. When tranlating a language pair not including the pivot, then we chain together the ource pivot ytem and the pivot target ytem. Recent work on pivot tranlation with tatitical machine tranlation ha invetigated more ophiticated approache, uch a the merging of phrae table (Wu and Wang, 2007), but imple chaining perform comparably well. Pivoting reduce the number of required ytem to 2(n 1) intead of

ource pivot target ource pivot1 pivot2 pivot3 pivot4 c o n e n u target ource1 ource2 ource3 ource4 ource5 c o n e n u target Pivot Tranlation Multi-Pivot Tranlation Multi-Source Tranlation Figure 3: Three type of ytem combination explored: (a) tranlating through a pivot language, (b) conenu of multiple pivot tranlation, (c) conenu of tranlating from multiple ource language. BLEU Diff. LP via en LP via fr < -15 0 (0%) 2 (0%) -15 to -10 0 (0%) 37 (8%) -10 to -5 3 (0%) 126 (30%) -5 to -2 16 (3%) 183 (43%) -2 to 2 120 (28%) 71 (16%) 2 to 5 122 (29%) 1 (0%) 5 to 10 151 (35%) 0 (0%) 10 8 (1%) 0 (0%) Table 7: Pivot tranlation. Uing Englih (en) a pivot motly gain in BLEU over direct tranlation, while pivoting through French (fr) and other language generally hurt. n(n 1), o in our cae to 42 intead of 462. We experimented with different pivot language. Surpriingly, uing Englih a a pivot increae tranlation performance more often than not. Thi i not the cae for other language. See Table 7 for ummary tatitic for Englih and French a pivot. When uing Englih a pivot, we find not much difference (BLEU diverge by up to 2 point) for about a third of language pair, for another third there are ignificant gain (2-5 point) and for another third even larger gain (5-10 point). However, uing French a pivot generally decreae performance, only for a ixth of language pair there i not much difference. Englih a pivot ha alo hown to be beneficial for Arabic Chinee tranlation (Habah and Hu, 2009). We find it hard to claim that thi i due to linguitic reaon, but rather an artifact of the data et we are uing. It i likely that mot of the text wa originally authored in Englih. 6.2 Multi-Pivot Tranlation While pivoting through any language but Englih doe generally lead to wore tranlation, it doe contitute an alternative tranlation path. A recent trend in tatitical machine tranlation i to combine the output of different MT ytem in form of a conenu tranlation. In multi-pivot tranlation, we combine the direct tranlation ytem with everal pivot ytem, a novel method. Our ytem combination method i an adaption of Roti et al. (2007). The multiple tranlation obtained from the different ytem are compiled into a word lattice that i earched for the mot likely tranlation, with the aid of a language model. The combination method i optimized, uing the originating ytem of each competing output word and phrae a a feature. Such multi-pivot ytem combination may be done for any language pair. We only did thi for language pair with Englih a target language, partly due to the large computational burden and partly becaue we wanted to compare thi method againt a trong baeline. Table 8 how the performance of uch multi-pivot ytem with all poible ource language tranlated into Englih. We varied the number of added pivot ytem. We achieved relatively mall gain (typically 0.5-1% BLEU), depending on the language pair and the number of pivot ytem added to the direct tranlation baeline. 6.3 Multi-Source Tranlation Since document often have to be tranlated into multiple language, one trategy to improve tranlation performance i to ue already generated tranlation in ome language to tranlate into yet another. Thi i called multi-ource tranlation. Again, we ue conenu tranlation method - the ame way a for multi-pivot tranlation. In our experimental et-up, we aume that we already have the document in all the other 21 language when tranlating them into the 22nd language. The baeline i the eaiet ource language for each target language. We then add additional ource language,

Source Direct 3 Bet 6 Bet bg 61.3 61.7 (+0.4%) 61.8 (+0.5%) de 53.6 54.0 (+0.4%) 54.4 (+0.8%) c 58.4 59.1 (+0.7%) 59.2 (+0.8%) da 57.6 58.0 (+0.4%) 57.9 (+0.3%) el 59.5 60.0 (+0.5%) 60.2 (+0.7%) e 60.0 60.2 (+0.2%) et 52.0 52.4 (+0.4%) 52.5 (+0.5%) fi 49.3 50.1 (+0.8%) 50.2 (+0.9%) fr 64.0 64.4 (+0.4%) 64.5 (+0.5%) hu 48.0 48.5 (+0.5%) it 61.0 61.6 (+0.6%) 61.7 (+0.7%) lt 51.8 52.3 (+0.5%) 52.2 (+0.4%) lv 54.0 54.6 (+0.6%) 54.9 (+0.9%) mt 72.1 72.2 (+0.1%) 72.3 (+0.2%) nl 56.9 57.4 (+0.5%) 57.6 (+0.7%) pl 60.8 61.1 (+0.3%) 61.3 (+0.5%) pt 60.7 61.0 (+0.3%) 61.2 (+0.5%) ro 60.8 61.6 (+0.8%) 61.9 (+1.1%) k 60.8 61.3 (+0.5%) 61.5 (+0.7%) l 61.0 61.0 (+0.0%) 61.2 (+0.2%) v 58.5 58.9 (+0.4%) 59.0 (+0.5%) Table 8: Multi-Pivot: Improving direct tranlation by ytem combination with pivot tranlation (all tranlation into Englih) tarting with the next eaiet, and o on. Table 9 how the reult. With more ource language, tranlation performance improve. For intance, for Spanih the eaiet ource language i French with 60.9%BLEU. By combining the output from tranlating three ource language (French, Portuguee, Italian), we achieve 63.0%BLEU (+2.1). Improvement vary for different target language, but they are typically in the range of 2 5%. 7 Concluion We built tranlation ytem for the larget number of language pair known to u uing the JRC-Aqui corpu. We carried out a regreion tudy to determine the main factor of tranlation difficulty, which explaine 74.5% of difference in core. We alo contrated direct tranlation ytem againt pivot tranlation and improved them with multi-pivot and multi-ource ytem combination method. 3 Reference Birch, A., Oborne, M., and Koehn, P. (2008). Predicting ucce in machine tranlation. In EMNLP. 3 Thi work wa upported by the EuroMatrix/EuroMatrixPlu project funded by the European Commiion (6/7th Framework Programme) and made ue of the reource provided by the Edinburgh Compute and Data Facility (http://www.ecdf.ed.ac.uk/). The ECDF i partially upported by the edikt initiative (http://www.edikt.org.uk/). Target Bet 3 Bet 6 Bet en 72.1 73.3 (+1.2%) 74.5 (+2.4%) bg 40.5 41.5 (+1.0%) 42.1 (+1.6%) de 46.9 49.9 (+3.0%) 50.7 (+3.8%) c 52.6 53.8 (+1.2%) 54.5 (+1.9%) da 50.0 51.9 (+1.9%) 52.8 (+2.8%) el 42.8 45.7 (+2.9%) 46.5 (+3.7%) e 60.9 63.0 (+2.1%) 63.7 (+2.8%) et 34.9 40.4 (+5.5%) 41.9 (+7.0%) fi 38.6 43.2 (+4.6%) 44.0 (+5.4%) fr 53.2 63.7 (+10.5%) 66.2 (+13.0%) hu 37.2 38.9 (+1.7%) 39.3 (+2.1%) it 56.1 59.8 (+3.7%) 61.5 (+5.4%) lt 39.6 43.0 (+3.4%) 43.4 (+3.8%) lv 43.4 44.1 (+0.7%) 45.6 (+2.2%) mt 39.8 39.9 (+0.1%) nl 52.3 54.5 (+2.2%) 55.5 (+3.2%) pl 49.2 49.6 (+0.4%) 50.0 (+0.8%) pt 61.0 61.2 (+0.2%) 62.9 (+1.9%) ro 49.0 50.0 (+1.0%) 50.0 (+1.0%) k 44.7 46.8 (+2.1%) 47.3 (+2.6%) l 50.7 51.5 (+0.8%) 52.1 (+1.4%) v 52.0 52.5 (+0.5%) 52.7 (+0.7%) Table 9: Multi-Source: Combining tranlation from different ource language Callion-Burch, C., Fordyce, C. S., Koehn, P., Monz, C., and Schroeder, J. (2008). Further meta-evaluation of machine tranlation. In 3rd Workhop on SMT, Columbu, Ohio. Dyen, I., Krukal, J., and Black, P. (1992). An Indoeuropean claification, a lexicotatitical experiment. Tranaction of the American Philoophical Society, 82(5). Galley, M. and Manning, C. D. (2008). A imple and effective hierarchical phrae reordering model. In EMNLP. Habah, N. and Hu, J. (2009). Improving Arabic-Chinee tatitical machine tranlation uing Englih a pivot language. In 4th Workhop on SMT, Athen, Greece. Koehn, P., Hoang, H., Birch, A., Callion-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zen, R., Dyer, C. J., Bojar, O., Contantin, A., and Herbt, E. (2007). Moe: Open ource toolkit for tatitical machine tranlation. In ACL Demo and Poter Seion. Papineni, K., Rouko, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine tranlation. In ACL. Roti, A.-V. I., Xiang, B., Matouka, S., Schwartz, R., Ayan, N. F., and Dorr, B. J. (2007). Combining output from multiple machine tranlation ytem. In HLT-NAACL. Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufi, D., and Varga, D. (2006). The JRC-Acqui: A multilingual aligned parallel corpu with 20+ language. In LREC. Wu, H. and Wang, H. (2007). Pivot language approach for phrae-baed tatitical machine tranlation. Machine Tranlation, 21(3):165 182.