Automated Translation Quality Assurance and Quality Control Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina
Andrew Bredenkamp Introductions (all) CEO acrolinx, Computational Linguist, QA Tool Vendor Julia V. Makoushina <Job Title>, <Background>, <Viewpoint> (=Practitioner?) Daniel Grasmick MD Lucy SW&S, Documentation and Translation Support, = Ex-Practitioner & MT Tool Vendor
Overview (AB) Big picture Technology perspective Technology / LSP perspective (Corporate) User perspective
Translation QA (AB) "I am not in the office at the moment. Please send any work to be translated"
Big Picture (AB) What is Translation QA for? Why is Translation QA Automation important? How do existing tools work? What can we expect from tools in the near future?
Why automate Translation QA? (AB) What you can measure, you can manage A way to objectively and repeatedly measure quality Early indicator for upcoming problems Helps reduce the cost of poor quality
Key criteria for Translation QA (AB) Part I: Delivery Checking modes Interactive (translator support) Batch-mode (process support) Back Office mode (housekeeping support) Metrics and Reporting Not J2450 Objective, automatic, available Standards-oriented TMX XLIFF
QA tools used (JM) 7.46% 6.72% 4.10% 2.61% 5.97% 11.94% 25.75% None Built into Trados Built into SDLX Built into Star Transit Built into WordFast Built into Déjà Vu 3.73% QA Distiller XBench 6.34% 8.21% 17.16% ErrorSpy Proprietary tool Other
Benchmarking (JM) 22 sentences, each contains 1 error HTML format, Trados/TagEditor Small glossary (1 term) RTL (Arabic, Farsi, Hebrew) CJK (Chinese Traditional) Cyrillic (Russian) Easter European (Polish, Czech) Western European (French)
Benchmarked Checks (JM) Untranslated/empty segments Same source different target Different source same target Punctuation marks in place Spaces before/after punctuation Number values and format Terminology and tags
Benchmarked Tools (JM) Déjà Vu X Workgroup version 7.5.302 ErrorSpy 4.0, build 001 QA Distiller 6.0.0 (build 188) SDLX 2007 QA Check, build 7014 Star Transit XV Professional, version 3.1 SP 21 Build 617 Trados QA Checker 1.0, plug-in to SDL Trados 2006 Wordfast version 5.51t3 XBench 2.7 (build 0.183)
Reliability of individual tools (JM) 100% 75% 50% 25% 0% Déjà Vu ErrorSpy QA Distiller SDLX QA Check Star Transit Trados QA Checker Wordfast Xbench All checks Supported checks only
Most effective tools by languages (JM) Western European Eastern European Cyrillic CJK Right to left QA Distiller QA Distiller QA Distiller QA Distiller ErrorSpy ErrorSpy ErrorSpy Trados QA Checker Trados QA Checker Trados QA Checker Trados QA Checker Trados QA Checker XBench ErrorSpy QA Distiller SDLX QA Check
Urgent Improvements Required (JM) Support for RTL languages Support for multilingual projects Support for more file formats Unicode support Batch processing Closer relations to project TM Checklists Eliminate some types of noise
Trados QA Checker 2.0 (JM)
Trados QA Checker 2.0 (JM) 100% 80% 60% 40% 20% 0% QA Checker 1.0 QA Checker 2.0 All checks Supported checks only
Getting More from Existing Tools (JM) Most users employ their tool s default configuration Wordfast macros Déjà Vu SQL queries XBench find & replace functionality Others regular expressions
Where Regexes Can Help (JM) Exclude errors like double space between sentences Check correct quotation marks Detect any forbidden character combinations Detect any occasionally tripled characters Detect double words Etc.
A Case Study: Palex (JM) Combination of tools: built into TM systems + XBench Select appropriate tools based on language, file format and client requirements Mainly Trados QA Checker 2.0 Extensive use of regular expressions Internally developed utilities
Quality Starts with the Author Category 1: professional authors can be trained and "convinced" to use QA tools less experienced writers and non-native speakers are more open towards tool support Category 2: occasional authors... care a lot less but the content is often time critical and needs translation example in a multilingual helpdesk environment: an angry customer does not care about style, grammar, orthography... but your processing chain does!
For professional writers, QA means: Better usage of translation tools segment recognition, higher TM matches, increased efficiency, reduced costs Challenges motivation of authors, company-wide agreement on maximal error level, structured and regular reporting, Tips/experience publication of (anonymized) checking results per domain/area = competition + motivation, integral part of Good Receipts esp. with external authors, tracing QA over releases,...
For occasional writers, QA means: No influence on quality of sources possible - but we all know that automation is more successful on quality source content. Automate correction of obvious errors in source text Semi-automated approach for other cases "humans" are offered correction alternatives whenever possible Alternatively: motivate author to provide less ambiguous texts
Translation Checks First step is to use already existing tools Recycling more effective with quality segments Technologies like Term Mining, TM, MT benefit from quality of source and provide better target output Monolingual and translation checks Language assets help setting up a solution QA checks are still rarely part of deliverables
Limits of QA Tools Most tools deal only with the surface of a text For content, you always need a subject matter expert Lack of linguistic knowledge leads to superficial results Same for TM: matches only reflect similarity of sources and not translation quality/effort Still untapped potential
Key functions for Translation QA (AB) Part II: Functionality Quality Checking: Spelling, Grammar, Style Reuse, Terminology Terminology Translation Bilingual Term Harvesting Segment Length Checking Using intelligent token counting Entity Checking Redundancy Checking
Questions How can an LSP set up automatic translation QA? How can a large organization set up translation QA? What impact can translation QA have on MT and vice-versa? How can I discuss QA between customer and vendor?
Thank you, Questions? Julia Makoushina Palex Languages & Software Tomsk, Russia julia@palex.ru Daniel Grasmick Lucy Software & Services Waibstadt, Germany daniel.grasmick@lucysoftware.com Andrew Bredenkamp acrolinx GmbH Berlin, Germany andrew.bredenkamp@acrolinx.com