(MT) Machine Translation Whitepaper. technology-driven language communication leaders

Similar documents
A web-based multilingual help desk

Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

State of Maryland Health Insurance Exchange

Statistical Pattern-Based Machine Translation with Statistical French-English Machine Translation

Archived Content. Contenu archivé

Personnalisez votre intérieur avec les revêtements imprimés ALYOS design

Life Sciences. Volume 5 August Issue date: August 7, 2008

Translation Solution for

Active Offer of Service in both Official Languages

Short Form Description / Sommaire: Carrying on a prescribed activity without or contrary to a licence

Canada. ... Canadian Nuclear Commission canadienne. Violation. Relevant Facts I Faits pertinents

Archived Content. Contenu archivé

FINAL DRAFT INTERNATIONAL STANDARD

Open call for tenders n SCIC C4 2014/01

DIRECTIVE ON ACCOUNTABILITY IN CONTRACT MANAGEMENT FOR PUBLIC BODIES. An Act respecting contracting by public bodies (chapter C-65.1, a.

Archived Content. Contenu archivé

Measuring Policing Complexity: A Research Based Agenda

Lingotek + Sitecore Finally. Networked translation inside Sitecore.

PLANNING COMMITTEE REPORT JUNE COMITÉ DE L URBANISME RAPPORT 31 LE 13 JUIN 2012

NUNAVUT HOUSING CORPORATION - BOARD MEMBER RECRUITMENT

An In-Context and Collaborative Software Localisation Model: Demonstration

Survey on Conference Services provided by the United Nations Office at Geneva

READ AND FOLLOW ALL SAFETY INSTRUCTIONS 1. DANGER RISK OF SHOCK DISCONNECT POWER BEFORE INSTALLATION

CERN EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH

System Requirements Orion

HEALTH CARE DIRECTIVES ACT

ACP-EU Cooperation Programme in Science and Technology (S&T II) / Programme de Coopération ACP-UE pour la Science et la Technologie

Bike Repair Station Page 1

PROMT Technologies for Translation and Big Data

How To Become A Foreign Language Teacher

AP FRENCH LANGUAGE AND CULTURE 2013 SCORING GUIDELINES

Annual Event 2016 Workshop New to Interreg, where to begin? Évènement annuel 2016 Atelier «Interreg pour les débutants, par où commencer?

Group Projects M1 - Cubbyhole

SUN SEEBEYOND egate INTEGRATOR RELEASE NOTES. Release 5.1.1

Hours: The hours for the class are divided between practicum and in-class activities. The dates and hours are as follows:

TIMISKAMING FIRST NATION

Office of the Auditor General / Bureau du vérificateur général FOLLOW-UP TO THE 2010 AUDIT OF COMPRESSED WORK WEEK AGREEMENTS 2012 SUIVI DE LA

AP FRENCH LANGUAGE AND CULTURE EXAM 2015 SCORING GUIDELINES

AP FRENCH LANGUAGE 2008 SCORING GUIDELINES

The new French regulation on gaming: Anything new in terms of payment?

Archived Content. Contenu archivé

Ottawa-Outaouais Interclub Photo Competition Guidelines ~ 2013 Presentation of Results RA Centre 2451 Riverside Dr Ottawa On

Formulaire de Modification de Données de l Emploi/Job Data Change Form France

Level 3 French, 2015

Lingotek + Oracle Eloqua

I will explain to you in English why everything from now on will be in French

ADMISSION AU COLLEGE UNIVERSITAIRE Samedi 1 mars 2014 ANGLAIS durée de l épreuve : 1h30 coefficient 1 IMPORTANT PARTIE RESERVEE A LA CORRECTION

Used as content for outbound telesales programmes and (potentially) inbound telesales response.

Annexe - OAuth Introduction. Xavier de Rochefort xderoche@labri.fr - labri.fr/~xderoche 15 mai 2014

Archived Content. Contenu archivé

Archived Content. Contenu archivé

Product / Produit Description Duration /Days Total / Total

Improve your English and increase your employability with EN Campaigns

KantanMT.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

ORDER FORM/Formulaire

Office of the Auditor General / Bureau du vérificateur général FOLLOW-UP TO THE 2007 AUDIT OF THE DISPOSAL OF PAVEMENT LINE MARKER EQUIPMENT 2009

ATP Co C pyr y ight 2013 B l B ue C o C at S y S s y tems I nc. All R i R ghts R e R serve v d. 1

General Certificate of Education Advanced Level Examination June 2012

Social Media Monitoring, Planning and Delivery

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

ESMA REGISTERS OJ/26/06/2012-PROC/2012/004. Questions/ Answers

IMMIGRATION Canada. Table of Contents. Family Class

Survey on use of Taser International 21ft cartridges

Lumination TM LED Luminaires Light Bar Retrofit Kit Series

What counts as useful advice in a university post-editing training context? Report on a case study.

Sun Management Center Change Manager Release Notes

Strategic Workforce Planning and Competency Management at Schneider Electric

Big Data and Scripting

MULTILINGUALISM IN EUROPE(AN MEDIA)

Travel to Vancouver to meet with a community member and local university official. $73.14 $0.46/km

We are pleased to present you with detailed instructions on processing your visa application with us. Within this information pack you will find:

Licence Informatique Année Exceptions

Globe Export. A passion for seaweed. Globe Export. Globe Export

Sun Integrated Lights Out Manager (ILOM) 3.0 Supplement for the Sun Fire X4150, X4250 and X4450 Servers

ARE NEW OIL PIPELINES AND TANKER FACILITIES VIABLE IN CANADA?

ESCALA. The perfect server for secure private clouds in AIX environments

ULYSSES L.T. FUNDS EUROPEAN GENERAL. L.T. Funds European General: Share Price Evolution INVESTMENT STRATEGY AUGUST 2015 COMMENT

RFP Translation and Interpretation Services for the UN Offices in Burundi

SUBJECT CANADA CUSTOMS INVOICE REQUIREMENTS. This Memorandum explains the customs invoice requirements for commercial goods imported into Canada.

Living, Working, Breathing the toolset How Alpha CRC has incorporated memoq in its production process

Council /Conseil PhD

Bibliothèque numérique de l enssib

REQUEST FORM FORMULAIRE DE REQUÊTE

First-half 2012 Results. August 29 th, Jean-Paul AGON. Chairman and CEO

A Guide to Website Localisation

ORDER FORM/Formulaire

Introduction au BIM. ESEB Seyssinet-Pariset Economie de la construction contact@eseb.fr

Financial Literacy Resource French As a Second Language: Core French Grade 9 Academic FSF 1D ARGENT EN ACTION! Connections to Financial Literacy

ENABLING OBJECTIVE AND TEACHING POINTS. DRILL: TIME: One 30 minutes period. 6. METHOD/APPROACH: a. demonstration; and. b. performance.

Transcription:

Machine Translation Whitepaper

What is Machine Translation? Machine Translation (MT) provides an automated way to translate content into one or more target languages using dedicated computer software. Regardless of the method of translation, the meaning of the original text must be accurately reflected in the target language so that the same message is conveyed; something which is typically more difficult for machines to achieve. There are different types of Machine Translation technology methods being used in the market: Rule-Based Machine Translation Statistical Machine Translation Hybrid Systems that use both Statistical and Rule Based The Rule-Based approach, which tends to be labelled as the classic approach to MT uses linguistic information; essentially a collection of rules (dictionaries, grammar) about the source and target languages. The translation is then reached by pattern matching the rules. Statistical Machine Translation uses a large set of data which contains good translations in many languages, usually referred to as a corpus of data/text. This corpus is used to support new texts to provide the most reasonable translation. To further support computers in translating content from one language into another many systems now use a hybrid model which pattern matches rules but also introduces statistical methods to ensure context is applied through the corpus of data it has access to. How can it be applied? There are three different types of services that can be provided for MT: Gist Translation Gist Translation Trained or Untrained Engines Post Edited Machine Translation A gist translation is sometimes referred to as raw MT and therefore it provides the lowest cost and quality of translations. With this approach you are entirely dependent on the intelligence of the MT engine that is being used. This approach does provide speed for low levels of content that require initial translation and typically this is what free online translation tools provide. When considering the free approach the key question is: Are you satisfied with your businesses intellectual property, sensitive or personal information being made available within the online public domain? Are you happy to risk quality and accuracy that 30% of the content may contain errors or misleading translations? This process is also extremely manual requiring small amounts of content to be placed in and out of the tools. For non-sensitive information where you require a general translation to be used for internal understanding then this approach could suit your needs (e.g. when you need to understand an email you have received from a colleague in a different language). Trained or Un-Trained MT Engines Within the industry you may see this being referred to as Intelligent, Customised or Trained engines; essentially this refers to the type of content being stored within the MT engine and the process followed to increase the accuracy of the translation. The processes are very similar to Translation Memory (TM) technology, as it uses previously translated content which is continuously maintained to improve the quality of future translations. The same approach is applied within the MT engine to achieve leverage. Our experience shows that Translation Memory will outperform Machine Translation therefore clients that have a high TM leverage will gain little extra from Machine Translation. A key consideration is how much can be gained by introducing another process within a workflow that is already leveraging quality content from an approved Translation Memory. The extra steps will also add to the delivery time of the overall project. Translation Memories also contain your unique terms, brand and product names (Terminology) which MT engines will typically attempt to translate. Professional products such as thebigword TMS would detect these and maintain the original text. An engine that is not trained is one that contains generic corpus data; therefore this greatly impacts the quality of translated content. In order to increase quality clients would then need heavy use of Post Edited Machine Translation (PEMT) by human translators. Post Edited Machine Translation (PEMT) PEMT is the process of sending MT engine output to human translators to be reviewed. Due to the low quality of raw MT the translator has to typically make the content understandable by addressing issues such as removing redundant words, replacing obscure or unknown text, and in some cases rewriting sentences. Where heavy post-editing takes place to transform the content into stylistically and contextually appropriate text, the project becomes more time consuming and costly than traditional translation methods. Applying PEMT within workflows introduces further stages of analysis against the MT engine; and the files will be prepared differently so the translator can identify PEMT content versus TM content correctly. Achieving quick time to market and balancing this against good quality translations becomes more difficult in this scenario.

Frequently Asked Questions Can I replace humans with machines for all of my translations? With current technology no amount of training the engine will produce human like results. The system cannot fully understand context, emotion, local nuances or peculiarities that exist within all languages and it will therefore take the literal translation which cannot be considered true localisation. Why does MT produce low quality translations? The challenge of having a MT solution that can create new text in the target language that reads as though it has been written by a person remains an issue. Anything that requires humour or sarcasm significantly reduces the quality of the output. The concept of PEMT came from the need to protect quality as it s only through human editing that we can achieve the level of quality that would be fit for public consumption. Will MT support all of my languages? When you compare MT to the 250 languages thebigword supports then any MT solution will seem to have poor language support. Our research showed that MT providers support as little as 7 languages in some cases and up to 90 in others. Additionally the quality varied greatly between the languages with Russian and Arabic languages causing particular issues for MT platforms. How is MT quality measured? An automated means of evaluation is BLEU (Bilingual Evaluation Understudy), an algorithm for evaluating the quality of text which has been Machine-Translated from one natural language to another. The closer a MT text is to a professional human translation then the better it is. However to assess the quality, a human post edit is required to determine how near or far the MT version was. BLEU is a general indicator of quality and does not provide a definitive level of applicability. Other providers within the market such as TAUS, the Translation Automation User Society, have developed tools which evaluate machine translated content. Again this requires human translators to post edit the machine translated content and the tool allows tracking of time, errors corrected and edits made. Can any translator perform PEMT? No, translators who are skilled in PEMT are needed to support this process. In order for the benefits of MT to be realised your LSP (Localization Service Provider) must have the translators to deal with MT content as translators that are not trained in how to edit MT content will need more time reducing cost benefits. Does MT support all types of content and specialisms? Specialisms and the type of the content that requires translation needs to be considered carefully as MT could be considered too risky for specialist material such as medical and safety instructions. Those in technical industries who are manufacturing goods, machines or dealing with chemical products have an extensive amount of terminology that is extremely specific requiring exact accuracy of translations into the target languages. Organisations could therefore put themselves at risk relying on MT alone. Is my content secure in MT? There continues to be a lot of debate around cloud security therefore any platform that uses this technology will bring the risk that private company data could end up in the public domain. Each client must consider the risk it may bring to their business and make an informed judgement based upon their own circumstances. thebigword approach to cloud platforms is to not use them at this time for any service other than email and web security, this stance will be reviewed as technology further evolves. What platforms are available for MT? The most frequently mentioned platforms by our clients tend to be online tools from major technology vendors. However these are typically aimed at consumers and not well suited to commercial use. Off the shelf MT solutions are available as are in-house developed solutions typically based upon the Moses platform. Will it save me money? To answer this question we would like to share with you a quote from the Common Sense Advisory: LSPs see the need for a multi-step process to ensure the requisite level of quality. This issue comes down to a fundamental reality the more LSPs employ humans to process MT output, the more they must pay their suppliers. At some point, PEMT loses its economic value as a way to increase a linguist s output for the same price. Barring further technological optimizations, postediting doesn t look that attractive. To support translation quality human evaluation is needed which can be time consuming therefore it could be costing you more than the traditional translation process. Therefore, only where MT can remove the need for a human linguist or at minimum reduce the time a linguist needs to perform PEMT is the solution likely to reduce cost to LSPs and therefore the client.

Recommended Considerations for Machine Translation Until an MT solution is available to benefit clients it is our recommendation that clients consider enhancements to workflow such as: CMS connectors and integrations to speed up delivery of work to your chosen LSP Translation Memory enhancements to increase leverage and lower costs The level of training linguists receive to ensure speed and quality of translation Automation of linguist selection. Machine Translation Workflows In general there are two commonly applied workflows within MT based solutions: Standard workflow to support direct MT for low levels of content such as forum and online chat conversations The Standard MT Workflow Client Submission MT Engine Content for Gist Enterprise workflow that supports integration and a full MT approach with and without PEMT for commercial use PEMT with Translator Proofread / review Our research shows that the best use for MT is for gist translations that will not be published externally but used for internal understanding. Where content will be for public consumption a PEMTprocess would definitely be recommended. Translations that are for marketing are not recommended for MT due to brand issues and the loss of key messages such as emotion conveyed within the material. Legal, medical and technical documents are not currently recommended for MT due to risk of terminology requirements and substance being lost, opening clients up to the risk of legal challenge and general business risk. Automation of linguist selection is key to quick delivery of content. The more specific you are regarding the linguists (e.g. must have medical specialism, must use a specific dialect of a language) the higher the chance that manual intervention will be required to source the linguist, increasing the delivery timescales and the charges to your chosen LSP from their supplier. Our research shows that in many cases linguists with generic specialisms (e.g. banking) complete translations to the equivalent level of quality as with someone with a specific specialism (e.g. investment banking). thebigword continues to invest in research and development surrounding an MT platform for our clients. However given the issues detailed which we are yet to overcome our commitment remains to ensure our clients do not experience negative impact from the use of MT, therefore we have chosen not to deploy any of the solutions developed to date until key challenges can be better addressed. Returned for internal use Returned reviewed content (public use) The Enterprise MT Workflow Client Submission Integrated / Connected TM Analysis MT Analysis Quote PEMT with translator Our research continues on content that seems best placed for MT which includes: Online Forum content e.g. chats, comments, articles Proofread / Review Social Media channels such Facebook, Twitter useful for comments from global customers Content that requires little to no specialist understanding or use of specific industry terms Return reviewed content (Public use) MT Engine Update TM Update

Machine Translation Sample Illustration English source to French target for general website content Source text Target French created by Human Translation only Helpful information regarding Service Request Submission Customers who have purchased CompanyX Services Priority Technical Support may be able to upgrade their basic ESC+ access to advance account-level access by visiting their CompanyX Services Priority Technical Support Portal. Advanced Electronic Service Call Tracking capabilities, such as account-level worldwide service incident tracking, are available as part of CompanyX Services Priority Technical Support. Customers who have purchased CompanyX Services Priority/ Technical Support may be able to upgrade their basic ESC+ access to advanced account-level access by visiting their CompanyX Services Priority Technical Support Portal. Valid for European Union Countries: CompanyX may need to call you back as a result of your online service request. If you prefer that we contact you via email, please add Contact by email only in the free text field in the ticket log. This will indicate to our service representative that any further communication should only be carried via the email address provided by you. ESC+ is not an electronic chat application. This feature is not available for your selected country and/or product. Informations utiles concernant la soumission de demandes de service Les clients ayant acheté le support technique prioritaire des Services CompanyX ont la possibilité de mettre à niveau leur accès de base à ESC+ vers un compte d accès avancé en se rendant sur le portail de support technique priori-taire des Services CompanyX. Des fonctions avancées de suivi électronique des appels de service, comme le suivi mondial des incidents de ser-vice à l échelon du compte, sont disponibles dans le cadre du support technique prioritaire des Services CompanyX. Les clients ayant acheté le support technique prioritaire des Services CompanyX ont la possibilité de mettre à niveau leur accès de base à ESC+ vers un compte d accès avancé en se rendant sur le portail de support technique prioritaire des Services CompanyX. Valide dans les pays de l Union Européenne : CompanyX sera peut-être amené à vous contacter suite à votre soumettre une demande de service en ligne. Si vous préférez être contacté par e-mail, veuillez indiquer «Contactez-moi uniquement par e-mail» dans la zone de texte libre du journal de ticket. Notre agent de service saura ainsi que toute communication ultérieure devra vous être adressée uniquement par e-mail à l adresse indiquée. ESC+ n est pas une application de conversation électronique instantanée. Cette fonction n est pas disponible pour votre pays et/ou le produit sélectionné. 100% Quality Quality Calculations Human Translation The human translation process resulted in text that was accepted into the translation memory for later re-use. This text is now regarded as a match for future work and as such attracts an overall score of 100%.

Machine Translation Sample Illustration Translation Memory matches, with remainder potentially available for PEMT Target French created by Machine Translation only Helpful information regarding Service Request Submission Customers who have purchased CompanyX Services Priority Technical Support may be able to upgrade their basic ESC+ access to advance account-level access by visiting their CompanyX Services Priority Technical Support Portal. Advanced Electronic Service Call Tracking capabilities, such as account-level worldwide service incident tracking, are available as part of CompanyX Services Priority Technical Support. Customers who have purchased CompanyX Services Priority/ Technical Support may be able to upgrade their basic ESC+ access to advanced account-level access by visiting their CompanyX Services Priority Technical Support Portal. Valid for European Union Countries: CompanyX may need to call you back as a result of your online service request. If you prefer that we contact you via email, please add Contact by email only in the free text field in the ticket log. 1 This will indicate to our service representative that any further communication should only be carried via the email address provided by you. ESC+ is not an electronic chat application. This feature is not available for your selected country and/or product. 2 92% Quality Informations utiles au sujet de la soumission de demande de Service Les clients qui ont acheté Support technique CompanyX Services prioritaire peuvent être en mesure d améliorer leur accès de ESC + base pour avancer des accès au niveau du compte en visitant leur portail de Support technique CompanyX Services prioritaires. Fonctions électroniques Service Call Tracking avancées, comme le suivi des incidents niveau du compte de service dans le monde entier, sont disponibles dans le cadre de l appui technique de CompanyX Services prioritaires. Clients qui ont acheté CompanyX Services prioritaires Support technique peut être en mesure de mettre à jour leur base ESC + accès à accès compte-niveau avancé en visitant leur portail de Support technique CompanyX Services prioritaires. Valide pour les pays de l Union européenne: 3 CompanyX peut devez vous contacte suite à votre demande de service en ligne. Si vous préférez que nous vous contacter par courriel, veuillez ajouter Contact par email seulement dans le champ de texte libre dans le journal de billet. Cela permettra d indiquer à notre représentant du service que toute communication ultérieure ne doit se faire via l adresse que vous avez fournie. ESC + n est pas une application de conversation électronique. 4 Cette fonctionnalité n est pas disponible pour votre pays choisi et/ou le produit. 5 12% Quality Quality Calculations Summary of MT Issues Translation Memory Segment Matches Machine Translation Segment Matches Machine Translation Output A translation memory tool was used to analyse the supplied text against a TM. In this example most segments were a 100% match with existing content. One segment 1 matched at 93% and the other 2 was new text, 0%. Averaging the segment match result across all words results in an average TM word quality score of 92.3%. Only 2 segments could be offered for MT, however we should consider the extra processing and time of PEMT versus performing a straight human translation on this content. To evaluate the quality of the MT translations a Translation Memory tool was used to evalaute how close the MT result was to the accepted human translation for each segemnt. All but three segments were found to be below the lower limit we would use from our Translation Memories; these segments were evaluated as having a 0% match. Three MT segments matched above our threshold with scores of 87% 3, 91% 4 and 86% 5. Averaging the segment match result across all words results in an average MT word quality score of 11.4%. File Format Issues: The following phrase was split into two segments either side of the / character, resulting in lost context for each segment s text during the MT process. The human translator recognised this issue and corrected for it in the translated text. Customers who have purchased CompanyX Services Priority/ Technical Support may be able to upgrade their basic ESC+ access to advanced account-level access by visiting their CompanyX Services Priority Technical Support Portal. Incorrect capitalisation: Valide pour les pays de l Union européenne: Incorrect quote marks: Should be «and» in the following text: Si vous préférez que nous vous contacter par courriel, veuillez ajouter Contact par email seulement dans le champ de texte libre dans le journal de billet. Context required: The linguist applied context in the manual translation whereas MT could not. MT: ESC + n est pas une application de conversation électronique. Linguist: ESC+ n est pas une application de conversation électronique instantanée.

Further information will be released as our research and development continues however, if you would like to discuss any aspects of this paper please contact marketing@