TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR CORD 01 5

Size: px
Start display at page:

Download "TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5"

Transcription

1 Projet ANR CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril Benoit Le Ny

2 Abstract The third task of the Transread project aims at developing new methods and tools to control the quality of human translations and translation memories. In this document, we introduce the different use cases we plan to consider during the project and summarize the different solutions we plan to study. We also quickly describe existing solutions of quality control in machine and human translation. 1

3 Quality Control in Human Translations: Use Cases and Specifications Benoit Le Ny April 2014 Contents 1 Introduction 2 2 State of the Art: benchmark on translation quality estimation, terminology management and CAT tools CAT tools and quality insurance tools (tested) Across Trados XBench QA CAT tools and quality insurance tools (not tested) QA Distiller ErrorSpy QuEst Interaction scenarios Bilingual reading and foreign language learning Human translation revision Automatic translation post-editing Format and visualization of quality indicators 10 5 Conclusion 14 1 Introduction Assessing the quality of bitext alignments is crucial in any translation task, be it machine or human generated. Yet, it is a difficult issue to address, and it has been overlooked. In response to an increasing demand of the translation industry, driven by the growth 2

4 of machine translation, many translation companies invested in developing formalized metrics for assigning different types of errors (terminology, spelling, mistranslations...). In the context of the TransRead project, we will focus on identifying key indicators to assess the quality of bitext alignments (parallel corpora, translation memories...). This document starts by analyzing existing tools, used both by academic and industry users for quality estimation control, then recalls the main user scenarios benefiting from such indicators, and finally puts a specific focus on the most important and promising kinds of monolingual and bilingual indicators that should be used for bi-text visualization and alignment quality estimation. The formats that will be used to represent monolingual and bilingual indicators in the files and to the user is described in the last section. 2 State of the Art: benchmark on translation quality estimation, terminology management and CAT tools A study of existing tools for both translation quality estimation and terminology management has been conducted at the beginning of the project. It helped us determine which indicators should be used for the TransRead project. Those tools include both open source, free and commercial tools. Not all could be tested, but all were analyzed. 2.1 CAT tools and quality insurance tools (tested) Across Across is a free CAT tool which assists translators in re-using contents, controlling processes, and ensuring a high quality level. The interesting features offered by Across (as well as most of all CAT tools) are the terminology control and translation consistency control, which consists in checking whether the same source is translated with the same translation in a whole document, both versus a translation memory and a glossary. For terminology control, there is no advanced source deflection or target inflection mechanism. For example, belle is not found as a form of beau, but both fleurs and fleuri are considered as forms of fleur. Nonetheless, those tools allow translators to work with translation memory, and have the added benefit of using fuzzy matches. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations. It usually operates at sentence-level segments. There is also a large list of additional quality assurance criteria, that we can split into: Monolingual criteria Bounding Spaces: consistent use of white spaces at the beginning and end of a paragraph. Date, time, and number format First capitalization : Checks whether the first word of a paragraph has been capitalized 3

5 Consistent brackets. Multiple spaces check: Checks whether the target document contains multiple consecutive white spaces. As we can see, those checks are close to common spell checking features. Bilingual criteria Trados 100% match check versus translation memory crossterm check : The QM criterion of the crossterm check verifies whether the source-language terms (source terms) in the source text have been duly translated with the corresponding target language terms (target terms). Field match and Field types match : tags matching Format style sequence and usage check: Compares whether the same formatting (such as boldface and italics) is used in the source and in the translation. Identical segments check : Checks whether the segments are identical in the source and target documents Identical punctuation check: Checks whether the number and sequence of the punctuation marks are identical in the source and target documents. Empty content check : looks for empty paragraphs Segments count check. Structure check : for structured resources (SGML, XML,.NET), performs a validity check Very similar functionalities are available in other CAT Tools, such as Trados: Segments to Exclude: allows skipping certain segments, for example, if a perfect match has already been found. Segments Verification: check that segments are completed correctly and there are no missing translations. Inconsistencies: allows checking for inconsistencies if the same text is rendered in several different ways. Punctuation: checks that punctuation is correct. For example, in some countries, certain punctuation rules have to be observed. Numbers: checks that numbers, times and dates are correct. Word List: user can specify the correct form of a word that should be used. For example, specify to only use website as one word and never as web site. Regular Expressions: users can configure regular expressions. 4

6 Trademark Check: checks that trademarks are used correctly. Length Verification: checks that the length is no longer than a specified number of characters. Terminology Verifier: checks documents to ensure that the target terms contained in the term base have been used during translation or to verify whether target terms have been used that are black-listed in the term base. Tag Verification XML Validation XBench QA XBench provides Quality Assurance and Terminology Management in a single package. The list of quality indicators are divided into four steps: 1. Basic criteria Untranslated Segments Inconsistency in source: Different sources with the same translation. Inconsistency in target: Same source with different translations. Source = target 2. Content criteria Tag Mismatch Numeric Mismatch Double Blank Repeated Word Key Term Mismatch CamelCase Mismatch ALLUPPERCASE Mismatch 3. Checklists: customized filters based on regular expressions 4. Spell-checking 2.2 CAT tools and quality insurance tools (not tested) This section deals with other translation quality tools that have not been tested, yet analyzed: features are reported here as they are announced in their documentation. 5

7 2.2.1 QA Distiller QA Distiller allows automatic detection and correction of format errors in human or machine translation, and in translation memories. Quality indicators available with these tools are: Omissions: Empty translations, (partially) forgotten translations, skipped translations, incomplete translations; Inconsistencies: Translation inconsistencies, source language inconsistencies; Language independent formatting: Spacing, punctuation, brackets, tab characters, capitalization; Language dependent formatting: Corrupt characters, spacing, number values, number formatting, quotation marks, measurement system; Terminology: Usage, consistency; Regular expressions: Fully customizable checks ErrorSpy ErrorSpy is another commercial quality assurance software for translations. Supported checks are: Terminology check: Using a terminology list, ErrorSpy checks whether the translator has used the correct terminology. Consistency check: ErrorSpy notes all sentences or segments which are identical in the source language. If the translations differ for such segments, the reviser is in-formed of these differences. Number check: Number errors are amongst the most serious. ErrorSpy not only checks whether numbers are identical in the source and target language, it also checks whether the decimal signs comply with the specification. Completeness check: It is possible to specify a certain length for the translation. All sentences which are too short are submitted to the reviser for checking. He can therefore find any missing translations. Tag check: ErrorSpy informs you of all segments containing tag errors. Acronym check: A further source of errors consists of wrongly adopted acronyms. ErrorSpy offers you the option of defining acronym patterns and checking whether they correspond in both languages. Typography check: dependent. The correctness of punctuation marks is checked language Missing translations: scan for missing translations. Identical source and target segments are reported to the person verifying the translation. 6

8 2.2.3 QuEst QuEst is an open source translation quality estimation tool. It is software toolkit that allows building up a statistical quality estimation system. Very much like Moses [1] for machine translation it provides some software both for feature extraction and for training a machine learning model from those features. This software has two main modules: a Java module to extract a number of sentence-level features (and a few wordlevel features) and a python module that interacts with the scikit-learn toolkit for machine learning. To create a working translation quality estimation system one needs to train it on the data specific to one s project. It is also possible to use some resources provided on the tool website (resources based on WMT shared task datasets): language models, training corpora. The quality features that can be extracted with this tool are divided into three categories: fluency, adequacy and confidence [3]. Some examples of features include: Fluency: number of tokens in the target segment; average number of occurrences of the target word within the target segment; Language Model probability of target segment using a large corpus of the target language to build the Language Model Adequacy: ratio of number of tokens in source and target segments; ratio of brackets and punctuation symbols in source and target segments; proportion of dependency relations between (aligned) constituents in source and target segments; Confidence: features and global score of the SMT system; number of distinct hypotheses in the n-best list; average size of the target phrases; 3 Interaction scenarios The precise definitions of useful cues for translation quality and other characteristics of bilingual texts demand a study of various possible interaction scenarios between the user and the future tool. Based on partners experience in education and learning, as well as in translation industry and in user interfaces, three possible use case scenarios for the tool prototype have been suggested and studied. 7

9 3.1 Bilingual reading and foreign language learning One possible use case for the tool would be to facilitate the reading of bilingual texts (literary works or scientific and other domain-specific articles) for non-native readers. In this situation TransRead tool is used as a means of improving the reading experience and enriching it with various clues for foreign language learning. The bilingual texts that are used for this particular case are supposed to be well-translated and translation quality indicators are but of minor importance. On the other hand, it is important for this scenario to indicate to the user the alignment links of source and target at different levels. As opposed to the minimum unit alignment as it is calculated by machine translation models, we are interested here in a more linguistic knowledge-based alignment, such that would allow the user to identify bigger units corresponding to idioms and collocations. Idiomatic expressions highlighting would facilitate the text analysis as well as external dictionary look-ups, which is yet another useful functionality for all of the usage scenarios. We will see in the last section how these complex bilingual quality indicators will be used to display language quality information to the user. 3.2 Human translation revision Another possible use case for the bilingual text device, and the one that is particularly important for the professional translation industry, is human translation revision. In this scenario, a translation produced by one translator is revised by another one. The translation quality of the texts presented for this task may be highly variable depending on the translator skills and experience. Typical useful interaction scenario would include various indicators of translation quality and highlight the regions with different types of problems and errors in the translated text. As the calculation of various bilingual and monolingual cues, as well as of the general quality estimation measure, is very time- and space-consuming, it is not possible for the moment to conceive a tool that will enable the reviser to make corrections and recalculate scores on the fly. The TransRead bilingual text tool is mainly destined to allow a quick visualization of translation quality information at various levels of precision and from various viewpoints. Different levels of precision should allow the user to visualize the quality of the whole text at document level and then to zoom in to any particular problematic part of the text at phrase or word level. Different viewpoints will allow visualizing separately different aspects and indicators of the bilingual text and its translation quality (most of these cues are also used to calculate the general translation quality estimation metric). Thus, for different translation revision scenarios, different viewpoints on translation quality and different kinds of information may be of use and it is important to make distinction between the errors that are likely to appear in a human translation as opposed to typical automatic translation system errors (subject of the third use case for our tool). Although translation quality varies greatly from one translator to another, still, we assume that we are dealing with translations by native or near-native speakers of the target language and that it is therefore highly unlikely that a human translator 8

10 produces an agrammatical sentence in the target language. On the other hand, human translation may contain some specific errors that are unlikely to appear in automatic translation, such as spelling errors, completely untranslated or partially translated segments, repeated segments, segments in inversed order, etc. Consequently, some specific features and quality indicators will probably only be useful for human translation scenario: it implies all simple translation consistency controls, monolingual and bilingual statistics and also the spellchecking feature. Indeed one of the most important features for human translation revision is the terminology consistency verification. For technical translation it is often crucial that the translation should respect a given terminology (domain terminology, company terminology, etc.) and that the translation of terms should be consistent throughout the whole document. It is accordingly important to include terminology verification as part of TransRead tool for human translation scenario. Whereas some indicators that are particularly helpful for the automatic translation quality estimation, such as language models for measuring the grammaticality of the target sentence, are unlikely to be of great use in human translation. There are also certain types of errors that are common to both automatic translation systems and human translators. For instance, idioms may be translated literally in both cases, if they are unknown either to the human translator or to the machine translation system. Similarly, polysemy and homonymy may be a source of errors in both human and machine translations. Thus, one of the goals of the TransRead project is to adapt statistical quality estimation systems developed for machine translation, that are using a variety of monolingual and bilingual cues, to the human translation revision case. To do so, important enough corpora of human revision must be collected, containing both the original and the revised version of the translated text. 3.3 Automatic translation post-editing Finally, another possible interaction scenario for TransRead tool might be in context of post-editing of the automatic translation output by a human translator. Post-editing, that for a long time has been used for purely academic purposes, is recently becoming more and more popular and promising technique in professional translation industry. Advances in machine translation technology allowed a better translation quality and a wider acceptance of its usage as a basis for a human translation in post-editing process. The difficulty to satisfy by purely manual translation the growing demand for localization of huge amounts of texts produced on web and by international institutions and companies, is yet another reason why the subject of post-editing has gained so much importance in modern translation community. Post-editing of the machine translation allows a considerable gain in speed and productivity of human translation. Automatic translation quality estimation and in particular a reliable phrase- level translation quality measure is of particular importance for post-editing scenario, since it will speed up the revision by indicating to the post-editor the relative quality of various segments and the post-editing effort required for their revision. Segments that receive high quality estimates may then be considered as well-translated, and the post- 9

11 editor may decide that they either do not need any modifications at all or just a quick cross-check. Low translation quality estimates may indicate that the segment is poorly translated, that it is not worth a post-editing effort and should be retranslated from scratch. All the other segments should be considered interesting for partial revision and the quality estimation system should highlight the problematic regions in those segments. Thus, useful features for post-editing scenario may include a general segment-level translation quality estimation score based on various combinations of features, as well as some local quality indicators. In this scenario, the complex monolingual and bilingual clues are the most useful, as unknown (OOV) words, language model scores, alignment confidence estimation, non-aligned words. As it has been stated above, machine translation is less likely to contain spelling errors, untranslated or partially translated segments than the human translation. It is to be noticed that automatic translation errors vary greatly according to the translation system used and the quality estimation models should be adapted accordingly. 4 Format and visualization of quality indicators Based on the user scenarios described previously, this section explains how the quality indicators can be implemented and visualized by the user. Quality indicators are all calculated on the basis of information available in the alignment file. A specific alignment format was defined for TransRead project between all partners in order to answer all needs of representation and exchange for all bilingual alignments during the whole project. It will be an XML file containing all alignments information: on source and target segments alignments, source and target word alignments, POS etc... The alignment format used for the TransRead project is described in the deliverable 1.1 Formats de représentation des alignements. In order to understand how the quality indicators will be used to display quality information on bilingual alignments to bilingual readers, human translation reviewer and post-editors, we propose here implementation of some of these quality indicators. In the three user scenarios described in the previous section, we can determine that consulting external dictionaries will allow the user to better understand the meaning of any particular source expression (including context disambiguation for ambiguous words) and to validate any particular translation choice. External dictionary search for terms and expressions selected by the user is the most basic and the most important functionality required by all possible usage scenarios. However, basic dictionary search can be sometimes quite tiresome and fruitless. For one thing, words and expressions have sometimes many distinct meanings (homonyms and polysemes) not to mention several parts of speech. Finding the right meaning that fits the context may take a lot of time, especially if the user is not experienced in working with dictionaries, or for a user not familiar with both languages. For another thing, it is not easy to find a good translation for idiomatic expressions, collocations and language clichés in a standard dictionary, since those expressions rarely constitute the keys in dictionaries (at best they are present as part of long lists of examples or idioms), and 10

12 since the dictionary coverage in not important enough. To address both of the above issues, we propose to enrich the user dictionary lookup experience with both context disambiguation and contextual dictionaries. Context desambiguisation will allow the user to be navigated directly to the dictionary article corresponding to the correct meaning of the word (conjectured from the context). A particularly useful tool for this purpose would be an innovative dictionary, such as BabelNet [2], that reunites a multilingual encyclopedic dictionary and an ontology connecting concepts and named entities in a very large network of semantic relations. Each entry with a unique ID represents only one given meaning and contains all the synonyms which express that meaning in a range of different languages. These information will be stored in the alignment file in the TransRead tool. The user can directly access the particular meaning for an ambiguous word and only see definitions, translations and even pictures for this meaning. Figure 1: Example of BabelNet results for room Contextual dictionary is a new type of dictionary currently developed at Reverso- 11

13 Softissimo for the TransRead project, which implements a bilingual concordancer and allow the user to search for simple words or expressions over a great amount of texts. The results contain both the aligned results in their context as well as the suggestions of the most frequent translations for the source expression calculated from those alignments. Figure 2: Example of Context results for room As you can see in Figure 2, source and target word aligned are highlighted in yellow to help the visualization of results to the user. In case of reading a text, it will be interesting to use a Context interface for example, in order for the user to be able to visualize source and target segments aligned, but also expressions and word alignments. For the TransRead project we can implement this feature for the whole text. While the user is reading a sentence from the source text with word1 word2 word3, if he selects word1, word1 and its translation will be highlighted. This can be done for each word, each expression, and finally to a whole segment. An improvement for Context that needs to be developed for this usage in this project will be to add a confidence score on the alignments. If we take the 5th results of the search room we can see that we have a wrong alignment between Room in English and prévu le in French. Here we can imagine two possibilities to display quality information on this alignment: 1) will be to add a warning on this segment saying that we detect a wrong alignment based on the alignment confidence score stored in the XML 12

14 alignment file; 2) will be to display this result at the end of the results in the section Other results. In the case of reviewing machine translation output, we can also think of using this feature based on this confidence score indicator. We can imagine using and improving Localize, which is a Reverso tool developed during the FLAVIUS project that allows all the Web sites and applications editors to generate easily, quickly and without technical knowledge, multilingual versions of their contents. While using Localize, the user will be able to review translated texts in the following way: a confidence score will be added in a column at the right of the translated texts, and a filter on the confidence score can help the user to review the less confident segments first for example, and omit the revision of confident segments in order to gain time. Figure 3: Visualization of Localize Reverso tool with source and target segments An overall quality estimation score with all indicators defined for this use case can be also implemented in Localize above the filter panel to give an estimation on the quality of the translation for this project, and filters can be added to help the user working on errors or bad translated text only. In case of human translation revision, two features can be added to help the user identifying translation errors. First of all, if we use the external dictionary look-up as Context, another possibility will be to use the aligned segments to check the consistency of translated texts. For example, if we keep an sentence from the example with the word room above: That s Mike s room there aligned with La chambre de Mike est là and the translation to review was the same segment in the source and different in the 13

15 target, the user will be able to check the consistency of this segment thanks to Context, and will be able to review this segment with the Context translated text found. It will be another kind of consistency control using a concordancer. Then, we can also add the comparison with a machine translation output to obtain another confidence score between translation, revision and MT output. We can imagine a user interface as the one in MTcompare Reverso tool (see figure below): the user will be able to review the translation comparing it with a given MT output; he will be able to easily visualize unknown words (text in red) and difference between the translated texts (text in blue). Figure 4: MTcompare user interface to review and compare two translations for a given source text. 5 Conclusion Thanks to the benchmark of current existing tools for quality estimation and CAT tools containing a quality control feature, we selected the appropriate quality clues for the TransRead project. They need to be calculated based on information stored in the alignment file. The visualization of these indicators will follow the presentation made in the last section, but can evolve depending on the tool used for TransRead and on experiments of the different scenarios (part of task 2). References [1] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages , Prague, Czech Republic, June Association for Computational Linguistics. [2] Roberto Navigli and Simone Paolo Ponzetto. Babelnet: Building a very large multilingual semantic network. In Proceedings of the 48th Annual Meeting of the Asso- 14

16 ciation for Computational Linguistics, pages , Uppsala, Sweden, July Association for Computational Linguistics. [3] Lucia Specia, Kashif Shah, Jose G.C. de Souza, and Trevor Cohn. Quest - a translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 79 84, Sofia, Bulgaria, August Association for Computational Linguistics. 15

Collaborative Machine Translation Service for Scientific texts

Collaborative Machine Translation Service for Scientific texts Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin

More information

Machine translation techniques for presentation of summaries

Machine translation techniques for presentation of summaries Grant Agreement Number: 257528 KHRESMOI www.khresmoi.eu Machine translation techniques for presentation of summaries Deliverable number D4.6 Dissemination level Public Delivery date April 2014 Status Author(s)

More information

Adaptation to Hungarian, Swedish, and Spanish

Adaptation to Hungarian, Swedish, and Spanish www.kconnect.eu Adaptation to Hungarian, Swedish, and Spanish Deliverable number D1.4 Dissemination level Public Delivery date 31 January 2016 Status Author(s) Final Jindřich Libovický, Aleš Tamchyna,

More information

Automated Translation Quality Assurance and Quality Control. Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina

Automated Translation Quality Assurance and Quality Control. Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina Automated Translation Quality Assurance and Quality Control Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina Andrew Bredenkamp Introductions (all) CEO acrolinx, Computational Linguist, QA Tool Vendor

More information

Computer Aided Translation

Computer Aided Translation Computer Aided Translation Philipp Koehn 30 April 2015 Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority

More information

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT

UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT UEdin: Translating L1 Phrases in L2 Context using Context-Sensitive SMT Eva Hasler ILCC, School of Informatics University of Edinburgh e.hasler@ed.ac.uk Abstract We describe our systems for the SemEval

More information

JOB BANK TRANSLATION AUTOMATED TRANSLATION SYSTEM. Table of Contents

JOB BANK TRANSLATION AUTOMATED TRANSLATION SYSTEM. Table of Contents JOB BANK TRANSLATION AUTOMATED TRANSLATION SYSTEM Job Bank for Employers Creating a Job Offer Table of Contents Building the Automated Translation System Integration Steps Automated Translation System

More information

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation

More information

THUTR: A Translation Retrieval System

THUTR: A Translation Retrieval System THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for

More information

An Online Service for SUbtitling by MAchine Translation

An Online Service for SUbtitling by MAchine Translation SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,

More information

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, 2014. Pangeanic - BI-Europe

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, 2014. Pangeanic - BI-Europe Project Management From industrial perspective A. Helle M. Herranz Pangeanic - BI-Europe EXPERT Summer School, 2014 Outline 1 Introduction 2 3 Translation project management without MT Translation project

More information

The KIT Translation system for IWSLT 2010

The KIT Translation system for IWSLT 2010 The KIT Translation system for IWSLT 2010 Jan Niehues 1, Mohammed Mediani 1, Teresa Herrmann 1, Michael Heck 2, Christian Herff 2, Alex Waibel 1 Institute of Anthropomatics KIT - Karlsruhe Institute of

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 7 16

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 7 16 The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 21 7 16 A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context Mirko Plitt, François Masselot

More information

Building a Web-based parallel corpus and filtering out machinetranslated

Building a Web-based parallel corpus and filtering out machinetranslated Building a Web-based parallel corpus and filtering out machinetranslated text Alexandra Antonova, Alexey Misyurev Yandex 16, Leo Tolstoy St., Moscow, Russia {antonova, misyurev}@yandex-team.ru Abstract

More information

Glossary of translation tool types

Glossary of translation tool types Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze

More information

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems Ergun Biçici Qun Liu Centre for Next Generation Localisation Centre for Next Generation Localisation School of Computing

More information

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter

More information

The Principle of Translation Management Systems

The Principle of Translation Management Systems The Principle of Translation Management Systems Computer-aided translations with the help of translation memory technology deliver numerous advantages. Nevertheless, many enterprises have not yet or only

More information

Factored Translation Models

Factored Translation Models Factored Translation s Philipp Koehn and Hieu Hoang pkoehn@inf.ed.ac.uk, H.Hoang@sms.ed.ac.uk School of Informatics University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland, United Kingdom

More information

PROMT-Adobe Case Study:

PROMT-Adobe Case Study: For Americas: 330 Townsend St., Suite 117, San Francisco, CA 94107 Tel: (415) 913-7586 Fax: (415) 913-7589 promtamericas@promt.com PROMT-Adobe Case Study: For other regions: 16A Dobrolubova av. ( Arena

More information

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande

More information

Translation Quality Assurance Tools: Current State and Future Approaches

Translation Quality Assurance Tools: Current State and Future Approaches [Translating and the Computer 29, November 2007] Translation Quality Assurance Tools: Current State and Future Approaches Contents Julia Makoushina Palex Languages and Software Tomsk, Russia julia@palex.ru

More information

TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies

TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies TRANSREAD: Designing a Bilingual Reading Experience with Machine Translation Technologies François Yvon and Yong Xu and Marianna Apidianaki LIMSI, CNRS, Université Paris-Saclay 91 403 Orsay {yvon,yong,marianna}@limsi.fr

More information

Hybrid Machine Translation Guided by a Rule Based System

Hybrid Machine Translation Guided by a Rule Based System Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit

The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58. Ncode: an Open Source Bilingual N-gram SMT Toolkit The Prague Bulletin of Mathematical Linguistics NUMBER 96 OCTOBER 2011 49 58 Ncode: an Open Source Bilingual N-gram SMT Toolkit Josep M. Crego a, François Yvon ab, José B. Mariño c c a LIMSI-CNRS, BP 133,

More information

Convergence of Translation Memory and Statistical Machine Translation

Convergence of Translation Memory and Statistical Machine Translation Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past

More information

LIUM s Statistical Machine Translation System for IWSLT 2010

LIUM s Statistical Machine Translation System for IWSLT 2010 LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Challenges of Automation in Translation Quality Management

Challenges of Automation in Translation Quality Management Challenges of Automation in Translation Quality Management Berlin, 12.09.2009 Dr. François Massion D.O.G. Dokumentation ohne Grenzen GmbH francois.massion@dog-gmbh.de Overview Quality definition and quality

More information

Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System

Applying Statistical Post-Editing to. English-to-Korean Rule-based Machine Translation System Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System Ki-Young Lee and Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research

More information

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud

The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46. Training Phrase-Based Machine Translation Models on the Cloud The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 37 46 Training Phrase-Based Machine Translation Models on the Cloud Open Source Machine Translation Toolkit Chaski Qin Gao, Stephan

More information

Statistical Machine Translation prototype using UN parallel documents

Statistical Machine Translation prototype using UN parallel documents Proceedings of the 16th EAMT Conference, 28-30 May 2012, Trento, Italy Statistical Machine Translation prototype using UN parallel documents Bruno Pouliquen, Christophe Mazenc World Intellectual Property

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:

More information

SYSTRAN v6 Quick Start Guide

SYSTRAN v6 Quick Start Guide SYSTRAN v6 Quick Start Guide 2 SYSTRAN v6 Quick Start Guide SYSTRAN Business Translator SYSTRAN Premium Translator Congratulations on your SYSTRAN v6 purchase which uses the latest generation of language

More information

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013

Integra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Integra(on of human and machine transla(on Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Motivation Human translation (HT) worldwide demand for translation services has accelerated,

More information

Advice Document: Bilingual Drafting, Translation and Interpretation

Advice Document: Bilingual Drafting, Translation and Interpretation Advice Document: Bilingual Drafting, Translation and Interpretation Background The principal aim of the Welsh Language Commissioner, an independent body established under the Welsh Language Measure (Wales)

More information

The TCH Machine Translation System for IWSLT 2008

The TCH Machine Translation System for IWSLT 2008 The TCH Machine Translation System for IWSLT 2008 Haifeng Wang, Hua Wu, Xiaoguang Hu, Zhanyi Liu, Jianfeng Li, Dengjun Ren, Zhengyu Niu Toshiba (China) Research and Development Center 5/F., Tower W2, Oriental

More information

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

TS3: an Improved Version of the Bilingual Concordancer TransSearch

TS3: an Improved Version of the Bilingual Concordancer TransSearch TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by

More information

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?

Why Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system? Why Evaluation? How good is a given system? Machine Translation Evaluation Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better?

More information

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!

Machine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem! Why Evaluation? How good is a given system? Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better? But MT evaluation is a di cult

More information

Working with MateCat User manual and installation guide

Working with MateCat User manual and installation guide Working with MateCat User manual and installation guide Introducing MateCat... 3 How MateCat Calculates Payable Words... 3 Volume Analysis Page... 4 Supported browser, languages and formats... 5 Translation

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Privacy Issues in Online Machine Translation Services European Perspective.

Privacy Issues in Online Machine Translation Services European Perspective. Privacy Issues in Online Machine Translation Services European Perspective. Pawel Kamocki, Jim O'Regan IDS Mannheim / Paris Descartes / WWU Münster Centre for Language and Communication Studies, Trinity

More information

A web-based multilingual help desk

A web-based multilingual help desk LTC-Communicator: A web-based multilingual help desk Nigel Goffe The Language Technology Centre Ltd Kingston upon Thames Abstract Software vendors operating in international markets face two problems:

More information

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013

Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013 Multilingual Term Extraction as a Service from Acrolinx Ben Gottesman Michael Klemme Acrolinx CHAT2013 Definitions term extraction: automatically identifying potential terms in a document (corpus) multilingual

More information

Application of Machine Translation in Localization into Low-Resourced Languages

Application of Machine Translation in Localization into Low-Resourced Languages Application of Machine Translation in Localization into Low-Resourced Languages Raivis Skadiņš 1, Mārcis Pinnis 1, Andrejs Vasiļjevs 1, Inguna Skadiņa 1, Tomas Hudik 2 Tilde 1, Moravia 2 {raivis.skadins;marcis.pinnis;andrejs;inguna.skadina}@tilde.lv,

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan

7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan 7-2 Speech-to-Speech Translation System Field Experiments in All Over Japan We explain field experiments conducted during the 2009 fiscal year in five areas of Japan. We also show the experiments of evaluation

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish

The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish The Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish Oscar Täckström Swedish Institute of Computer Science SE-16429, Kista, Sweden oscar@sics.se

More information

Getting Off to a Good Start: Best Practices for Terminology

Getting Off to a Good Start: Best Practices for Terminology Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction

More information

Towards Application of User-Tailored Machine Translation in Localization

Towards Application of User-Tailored Machine Translation in Localization Towards Application of User-Tailored Machine Translation in Localization Andrejs Vasiļjevs Tilde SIA andrejs@tilde.com Raivis Skadiņš Tilde SIA raivis.skadins@ tilde.lv Inguna Skadiņa Tilde SIA inguna.skadina@

More information

Dutch Parallel Corpus

Dutch Parallel Corpus Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT

More information

Modern foreign languages

Modern foreign languages Modern foreign languages Programme of study for key stage 3 and attainment targets (This is an extract from The National Curriculum 2007) Crown copyright 2007 Qualifications and Curriculum Authority 2007

More information

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System

Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Segmentation and Punctuation Prediction in Speech Language Translation Using a Monolingual Translation System Eunah Cho, Jan Niehues and Alex Waibel International Center for Advanced Communication Technologies

More information

PROMT Technologies for Translation and Big Data

PROMT Technologies for Translation and Big Data PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED

More information

On the practice of error analysis for machine translation evaluation

On the practice of error analysis for machine translation evaluation On the practice of error analysis for machine translation evaluation Sara Stymne, Lars Ahrenberg Linköping University Linköping, Sweden {sara.stymne,lars.ahrenberg}@liu.se Abstract Error analysis is a

More information

Common Core Progress English Language Arts

Common Core Progress English Language Arts [ SADLIER Common Core Progress English Language Arts Aligned to the [ Florida Next Generation GRADE 6 Sunshine State (Common Core) Standards for English Language Arts Contents 2 Strand: Reading Standards

More information

Statistical Machine Translation for Automobile Marketing Texts

Statistical Machine Translation for Automobile Marketing Texts Statistical Machine Translation for Automobile Marketing Texts Samuel Läubli 1 Mark Fishel 1 1 Institute of Computational Linguistics University of Zurich Binzmühlestrasse 14 CH-8050 Zürich {laeubli,fishel,volk}

More information

Visualizing Data Structures in Parsing-based Machine Translation. Jonathan Weese, Chris Callison-Burch

Visualizing Data Structures in Parsing-based Machine Translation. Jonathan Weese, Chris Callison-Burch The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 127 136 Visualizing Data Structures in Parsing-based Machine Translation Jonathan Weese, Chris Callison-Burch Abstract As machine

More information

Question template for interviews

Question template for interviews Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by

More information

KantanMT.com. www.kantanmt.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

KantanMT.com. www.kantanmt.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT. KantanMT.com No Hardware. No Software. No Hassle MT. The world s #1 MT Platform Communicate globally, easily! Create customized language solutions in the cloud. www.kantanmt.com What is KantanMT.com? KantanMT

More information

Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6

Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6 Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6 4 I. READING AND LITERATURE A. Word Recognition, Analysis, and Fluency The student

More information

How To Build A Machine Translation Engine On A Web Service (97 106)

How To Build A Machine Translation Engine On A Web Service (97 106) The Prague Bulletin of Mathematical Linguistics NUMBER 93 JANUARY 2010 97 106 ScaleMT: a Free/Open-Source Framework for Building Scalable Machine Translation Web Services Víctor M. Sánchez-Cartagena, Juan

More information

Moses from the point of view of an LSP: The Trusted Translations Experience

Moses from the point of view of an LSP: The Trusted Translations Experience Moses from the point of view of an LSP: The Trusted Translations Experience Sunday 25 March 13:30-17:30 Gustavo Lucardi COO Trusted Translations, Inc. @glucardi An On-going Story Not a Success Story (Yet)

More information

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation

UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation Liang Tian 1, Derek F. Wong 1, Lidia S. Chao 1, Paulo Quaresma 2,3, Francisco Oliveira 1, Yi Lu 1, Shuo Li 1, Yiming

More information

LetsMT!: A Cloud-Based Platform for Do-It-Yourself Machine Translation

LetsMT!: A Cloud-Based Platform for Do-It-Yourself Machine Translation LetsMT!: A Cloud-Based Platform for Do-It-Yourself Machine Translation Andrejs Vasiļjevs Raivis Skadiņš Jörg Tiedemann TILDE TILDE Uppsala University Vienbas gatve 75a, Riga Vienbas gatve 75a, Riga Box

More information

Content Management & Translation Management

Content Management & Translation Management Content Management & Translation Management Michael Hoch Business Consulting SDL TRADOS Technologies @ 1. European RedDot User Conference London/Stansted AGENDA SDL TRADOS Technologies Some Terminology:

More information

ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY: A COST/BENEFIT ANALYSIS by Lynn E. Webb BA, San Francisco State University, 1992 Submitted in

ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY: A COST/BENEFIT ANALYSIS by Lynn E. Webb BA, San Francisco State University, 1992 Submitted in : A COST/BENEFIT ANALYSIS by Lynn E. Webb BA, San Francisco State University, 1992 Submitted in partial satisfaction of the requirements for the Degree of MASTER OF ARTS in Translation of German Graduate

More information

REPORT ON THE WORKBENCH FOR DEVELOPERS

REPORT ON THE WORKBENCH FOR DEVELOPERS REPORT ON THE WORKBENCH FOR DEVELOPERS for developers DELIVERABLE D3.2 VERSION 1.3 2015 JUNE 15 QTLeap Machine translation is a computational procedure that seeks to provide the translation of utterances

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

A Framework for Data Management for the Online Volunteer Translators' Aid System QRLex

A Framework for Data Management for the Online Volunteer Translators' Aid System QRLex Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. A Framework for Data Management for the Online Volunteer Translators' Aid System QRLex Youcef Bey,

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation

Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation Nicola Bertoldi Mauro Cettolo Marcello Federico FBK - Fondazione Bruno Kessler via Sommarive 18 38123 Povo,

More information

Polish - English Statistical Machine Translation of Medical Texts.

Polish - English Statistical Machine Translation of Medical Texts. Polish - English Statistical Machine Translation of Medical Texts. Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology kwolk@pjwstk.edu.pl Abstract.

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

Convergence of Translation Memory and Statistical Machine Translation

Convergence of Translation Memory and Statistical Machine Translation Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB Scotland, United Kingdom pkoehn@inf.ed.ac.uk Jean Senellart

More information

Submission guidelines for authors and editors

Submission guidelines for authors and editors Submission guidelines for authors and editors For the benefit of production efficiency and the production of texts of the highest quality and consistency, we urge you to follow the enclosed submission

More information

On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications

On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications On-line and Off-line Chinese-Portuguese Translation Service for Mobile Applications 12 12 2 3 Jordi Centelles ', Marta R. Costa-jussa ', Rafael E. Banchs, and Alexander Gelbukh Universitat Politecnica

More information

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC

OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC OPTIMIZING CONTENT FOR TRANSLATION ACROLINX AND VISTATEC We ll look at these questions. Why does translation cost so much? Why is it hard to keep content consistent? Why is it hard for an organization

More information

Adapting General Models to Novel Project Ideas

Adapting General Models to Novel Project Ideas The KIT Translation Systems for IWSLT 2013 Thanh-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik and Alex Waibel Institute for Anthropomatics KIT - Karlsruhe

More information

Anubis - speeding up Computer-Aided Translation

Anubis - speeding up Computer-Aided Translation Anubis - speeding up Computer-Aided Translation Rafał Jaworski Adam Mickiewicz University Poznań, Poland rjawor@amu.edu.pl Abstract. In this paper, the idea of Computer-Aided Translation is first introduced

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation

Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Systematic Comparison of Professional and Crowdsourced Reference Translations for Machine Translation Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, John Makhoul Raytheon BBN Technologies

More information

Alignment of the National Standards for Learning Languages with the Common Core State Standards

Alignment of the National Standards for Learning Languages with the Common Core State Standards Alignment of the National with the Common Core State Standards Performance Expectations The Common Core State Standards for English Language Arts (ELA) and Literacy in History/Social Studies, Science,

More information

Translation and Localization Services

Translation and Localization Services Translation and Localization Services Company Overview InterSol, Inc., a California corporation founded in 1996, provides clients with international language solutions. InterSol delivers multilingual solutions

More information

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency) Machine Translation as a translator's tool Oleg Vigodsky Argonaut Ltd. (Translation Agency) About Argonaut Ltd. Documentation translation (Telecom + IT) from English into Russian, since 1992 Main customers:

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

ENGLISH LANGUAGE ARTS

ENGLISH LANGUAGE ARTS ENGLISH LANGUAGE ARTS INTRODUCTION Clear student learning outcomes and high learning standards in the program of studies are designed to prepare students for present and future language requirements. Changes

More information

Human in the Loop Machine Translation of Medical Terminology

Human in the Loop Machine Translation of Medical Terminology Human in the Loop Machine Translation of Medical Terminology by John J. Morgan ARL-MR-0743 April 2010 Approved for public release; distribution unlimited. NOTICES Disclaimers The findings in this report

More information

XTM for Language Service Providers Explained

XTM for Language Service Providers Explained XTM for Language Service Providers Explained 1. Introduction There is a new generation of Computer Assisted Translation (CAT) tools available based on the latest Web 2.0 technology. These systems are more

More information

2-3 Automatic Construction Technology for Parallel Corpora

2-3 Automatic Construction Technology for Parallel Corpora 2-3 Automatic Construction Technology for Parallel Corpora We have aligned Japanese and English news articles and sentences, extracted from the Yomiuri and the Daily Yomiuri newspapers, to make a large

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Leveraging ASEAN Economic Community through Language Translation Services

Leveraging ASEAN Economic Community through Language Translation Services Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)

More information