Metadata in Translation Tools: Importance, Usage, Storage, Transfer. Angelika Zerfaß & Richard Sikes

Similar documents
Translation and Localization Services

Localization Framework tekom Herbsttagung 2009

Localizing dynamic websites created from open source content management systems

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

Introduction to the Translation Workspace

Online Help for Project Managers and Translators

XLIFF SUPPORT IN CAT TOOLS

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8


Living, Working, Breathing the toolset How Alpha CRC has incorporated memoq in its production process

Content Management & Translation Management

ChangeTracker Quick Start Guide

Question template for interviews

XTM Cloud Explained. XTM Cloud Explained. Better Translation Technology. Page 1

The Principle of Translation Management Systems

XTM for Language Service Providers Explained

SDL Trados Studio 2015 Translation Memory Management Quick Start Guide

Olifant: translation memory editor

SDL Trados Studio 2015 Project Management Quick Start Guide

Introduction to OpenTM2 An Open Source Solution for Translators

Prof. Dr. Klemens Waldhör Chief Architect

Transit NXT Product Guide Service Pack 7 09/2013

SDL International Localization Services Overview for GREE Game Developers

Glossary of translation tool types

Automated Translation Quality Assurance and Quality Control. Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina

The Export Dialog Target Folder Importing files

SDLXLIFF in Word. Proof-reading SDLXLIFF files in MS Word. Best practice guide

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

Step-by-Step Guide for Monitoring in Windows HPC Server 2008 Beta 2

Minimal Translation Management (M11M) a training for those working with customers who are managing translations as a side job -Introduction-

Intel s Localization BUS Initiative To XLIFF or not to XLIFF. Loïc Dufresne de Virel Localization Strategist

KantanMT.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

Human Translation Server

Challenges of Automation in Translation Quality Management

Using Author-it Localization Manager

Translation Proxy A New Option for Managing Multilingual Websites

Anubis - speeding up Computer-Aided Translation

Reporting with Pentaho. Gabriele Pozzani

"Better is the enemy of good." Tips for Translators Who Migrate to Across

XLIFF 1.2. A white paper on version 1.2 of the XML Localisation Interchange File Format (XLIFF)

Convergence of Translation Memory and Statistical Machine Translation

The MOLTO Translation Tools API

Translation Management Systems Explained

ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY: A COST/BENEFIT ANALYSIS by Lynn E. Webb BA, San Francisco State University, 1992 Submitted in

Preparing RTF and MS Word Files with Untranslatable Content for SDL Trados TagEditor & Déjà Vu

Messaging Dashboard Quick Reference Guide

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

Transit/TermStar NXT

Transit NXT. Ergonomic design New functions Process-optimised user interface. STAR Group your single-source partner for information services & tools

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

W3Perl A free logfile analyzer

TRANSLATIONS FOR A WORKING WORLD. sales@milengo.com

ADOPTING STANDARDS BASED XML FILE FORMATS IN OPEN SOURCE LOCALISATION

Translation Solution for

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Multilingual Translation Services

Translator s Workbench User Guide

Cabcharge Australia Limited, Cabcharge TMS Help Manual

TRANSLATION PROXY A Hands-Off Option for Managing Multilingual Websites

Trollbeads Marketing Platform

JiJi Technologies JiJi Active Directory Cleaner User Manual

The CroCo Translation Archive

Localization Profile 2014

Your single-source partner for corporate product communication. Transit NXT Service Pack 8 What's new?

Better Translation Technology. User Manual For Administrators, Project Managers, Linguists & Customers

GEOFLUENT TRANSLATION MANAGEMENT SYSTEM

How to translate your website. An overview of the steps to take if you are about to embark on a website localization project.

PhonEX ONE Microsoft Sample Reports November 2010

Dutch Parallel Corpus

We Answer All Your Localization Needs!

State of the Art in Translation Memory Technology

Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014

Reports, Features and benefits of ManageEngine ADAudit Plus

Build Your Knowledge!

Completing an Accounts Payable Audit With ACL (Aired on Feb 15)

IBM SPSS Data Preparation 22

MultiAlign Software. Windows GUI. Console Application. MultiAlign Software Website. Test Data

How To Understand The History Of A Webmail Website On A Pc Or Macodeo.Com

Introduction. Regards, Lee Chadwick Managing Director

The document may be freely distributed in its entirety, either digitally or in printed format, to all EPiServer Mail users.

Microsoft Dynamics CRM 2016 Interactive Service Hub User Guide

Reports, Features and benefits of ManageEngine ADAudit Plus

Marketing System Options

Welcome to PowerClaim Net Services!

Transcription:

Metadata in Translation Tools: Importance, Usage, Storage, Transfer Angelika Zerfaß & Richard Sikes

Metadata is... Data that describes other data. It provides information about a certain item's content.

Some Examples from everyday life Book Author Genre Subject Length When written Summary Location Key words Graphic File: How large Color depth Image resolution When created File name Description Key words

Data vs Metadata

Metadata in Localization What kind of metadata can be provided throughout the localization process? Where and how can it be used? How well does it transfer between tools?

Metadata in Translation Tools Information about the translation itself Creation date User who created the translation User who created the translation databases TMs, Term bases... status information about the file in translation (is the translation confirmed, does it come from a TM, automated translation, Alignment

Uses Metadata within one TM system: Categorize translated content within one translation memory system Influence the match rates during translation.

Transfer between Tools Different translation tool components use different exchange formats. Translation memories are exchanged via TMX (Translation Memory Exchange format). Terminology data is exchanged via CSV or TBX (Term Base Exchange format). While the segment pairs of source segment plus translated segment can be exchanged via XLIFF or a customized version thereof.

Where does Metadata Reside? In the header of exchange file formats like TMX, XLIFF, TBX Administrative data user defined data (name of TM, path where XLIFF was saved to ) On the translation unit/term level of the exchange files Administrative data (who and when) Process data (coming from alignment ) categorization data (pre-defined and customer fields) Inside segments (formatting )

More detailed used for Metadata Searching for segments in a TM Searching for terms in a term base Filtering for segments during translation (prefer / penalize, i.e. decrease match rate) TM clean-up TM splitting (filter during export) Term base splitting (filter during export)

Details Metadata in TMs TM level data Administrative data (name, who, when) Pre-defined categories (subject, client ) if the tool so provides Document level data with corpus based tools TU level data Administrative (created when and by who) Process data (TU comes from alignment ) Custom categories

Our Tests Create TM with custom metadata fields Add translations to TM with field information Export to TMX Import into different tool What metadata transfers well, what doesn t

Metadata when creating a TM (1) select/fill predefined fields

Metadata when creating a TM (2) custom fields

Metadata in the TM associated with a segment

Information on the TM level <?xml version="1.0" encoding="utf-16"?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtool="memoq" creationtoolversion="5.0.21" segtype="sentence" adminlang="en-us" creationid="azerfass" srclang="en" o-tmf="memoqtm" datatype="unknown"> <prop type="defclient">client A</prop> <prop type="defproject">proj 123</prop> <prop type="defdomain">automotive</prop> <prop type="defsubject">transmission</prop> <prop type="description">models 1-5b</prop> <prop type="targetlang">de</prop> <prop type="name">full Settings</prop> </header>

Information on the segment level <tu changedate="20111012t003107z" creationdate="20111012t003107z" creationid="azerfass" changeid="azerfass"> <prop type="client">autoparts</prop> <prop type="project">1-2-3-4-5</prop> <prop type="domain">automotive-aeronautics</prop> <prop type="subject">spare parts</prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <prop type="x-document">demo 1</prop> <prop type="x-reviewer">rev1</prop> <prop type="x-internal id">44567</prop> <prop type="x-date of review">20111012t003000z</prop> <prop type="x-doc type">broschure</prop> <prop type="x-model">model 1</prop> <prop type="x-model">model 5</prop> <tuv xml:lang="en"> <prop type="x-context-pre"><seg>dies ist ein neuer Satz.</seg></prop> <prop type="x-context-post"><seg>dies ist ein kurzer wunderschöner Satz.</seg></prop> <seg>dies ist ein kurzer neuer Satz.</seg> </tuv> <tuv xml:lang="de"> <seg>this is a short, new sentence.</seg> </tuv> </tu>

Metadata used for sorting in the TM

Metadata when creating a TM (1) select/fill predefined fields

Metadata when creating a TM (2) custom fields

Metadata in the TM associated with a segment

Information on the TM level <?xml version="1.0" encoding="utf-8"?> <tmx version="1.4"> <header creationtool="sdl Language Platform" creationtoolversion="8.0" o-tmf="sdl TM8 Format" datatype="xml" segtype="sentence" adminlang="de-de" srclang="de-de" creationdate="20111012t011627z" creationid="z- 0314F13C5AED4\A"> <prop type="x-reviewer:singlestring"></prop> <prop type="x-doc type:singlepicklist">legal,workshop manual,website,broschure</prop> <prop type="x-model:singlepicklist">model 1,model 2,model 3,model 4,model 5</prop> <prop type="x-internal id:integer"></prop> <prop type="x-review date:datetime"></prop> <prop type="x-recognizers">recognizeall</prop> <prop type="x-tmname">tm for TMX test</prop> </header>

Information on the segment level <tu creationdate="20111012t032948z" creationid="align!" changedate="20111012t032948z" changeid="align!" lastusagedate="20111012t013621z" usagecount="2"> <prop type="x-context">-8428286702482475836, 1404007344699555312</prop> <prop type="x-context">615444784753120163, 615444784753120163</prop> <prop type="x-origin">alignment</prop> <prop type="x-originalformat">tradostranslatorsworkbench</prop> or <prop type="x-origin">tm</prop> <prop type="x-confirmationlevel">translated</prop> <prop type="x-review date:datetime">20100303t120000z</prop> <prop type="x-reviewer:singlestring">az</prop> <prop type="x-internal id:integer">12345</prop> <prop type="x-doc type:singlepicklist">workshop manual</prop> <prop type="x-model:singlepicklist">model 4</prop> <tuv xml:lang="de-de"> <seg>dies ist ein neuer Satz.</seg> </tuv> <tuv xml:lang="en-us"> <seg>this is a new sentence.</seg> </tuv> </tu>

Metadata used for sorting in the TM

Metadata on document level

Information on the document level <?xml version="1.0" encoding="utf-16"?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtool="multitrans" creationtoolversion="5.0.1947.0" segtype="sentence" o-tmf="dvmdb" adminlang="en-us" srclang="en-us" datatype="html" creationdate="20111011t220958z" creationid="richard" changedate="20111011t223931z" changeid="richard"><prop type="txb:name">locworld HTML Example</prop> <prop type="doc:created date">20111011t094600z</prop> <prop type="doc:modified date">20111011t095000z</prop> <prop type="doc:name">html Example.htm</prop> <prop type="doc:source language">eng</prop> <prop type="doc:revision Number">0</prop> <prop type="doc:revision Date">20111011T000000Z</prop> <prop type="doc:created date">20111011t094600z</prop> <prop type="doc:modified date">20111011t095000z</prop> <prop type="doc:name">html Beispiel.htm</prop> <prop type="doc:source language">eng</prop> <prop type="doc:revision Number">0</prop> <prop type="doc:revision Date">20111011T000000Z</prop> </header>

Metadata on document level

Metadata on document level Take TMX from Trados 2009 to memoq 5 Add all unknown fields to the setup

Tests Take TMX from Trados 2009 to memoq 5

Tests Take TMX from memoq and MultiTrans into SDL Trados 2009, different use for picklist and text field values memoq <prop type="x-model">model 1</prop> <prop type="project">1-2-3-4-5</prop> MultiTrans <prop type="document Type">Manual</prop> SDL Trados 2009 <prop type="x-doc type:singlepicklist">workshop manual</prop> <prop type="xreviewer:singlestring">az</prop>

Tests Mapping of fields from a TMX file to existing fields in MultiTrans

Tests

Tests Create a TM with translations from one specific file format like, HTML, DOC, InDesign, XML Export to TMX and import into another tool Run a translation of the exact same file used to create the first TM Match rates differ greatly depending on file format used to create the TM Because different segmentation rules are applied Because inline tags are not recognized or interpreted differently or were not imported at all during TMX exchange

TMX exchange results

Details Metadata in XLIFF File level data Administrative data (name, who, when) TU level data Administrative (created when and by who) Process data (status of translation, origin of match) Comments, history

What an XLIFF contains <?xml version="1.0" encoding="utf-8"?> <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:mq="mqxliff" xmlns:xsi="http://www.w3.org/2001/xmlschemainstance" xsi:schemalocation="urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-transitional.xsd"> <file original="c:\users\azerfass.zaac\desktop\tmx Test\Beispieldateien\HTML Beispiel.htm" mq:id="e3b904d9-7967-46bc- 88e3-1bdd83d25544" source-language="de" target-language="en" datatype="x-html"> <header> <skl> <internalfile>uesdbbqaaaaiajso8j5axl8ttaiaadoiaaaraaaasfrntcbczwlzc GllbC54bWytVVFv2jAQfp+0/4B4B0OgLVRuKgbbigQdK2yrJqTJxJdgNbEj2y m0v37gctqjqxtsxic++777znf2gd9uo7d2dfixww/q7warxgpuccp4cf NPtN/o1W/..

What an XLIFF file can contain on the translation unit level <body> <trans-unit id="1" mq:status="manuallyconfirmed" mq:rep="rep" mq:segmentguid="1f447f04-f394-43e6-aacc-355a1dabed92" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-07-18t15:47:25z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="1" mq:hasfollowingobject="hasfollowingobject">beispielseite</source> <target>sample page</target> </trans-unit>

What an XLIFF file can contain on the translation unit level <trans-unit id="5" mq:minorversionend="5" mq:minorversionstart="4" mq:status="partiallyedited" mq:segmentguid="5ecd83ee-3bd8-4a86-9a96- ed10485254bc" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01- 01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10- 11T15:52:05Z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="6">hier kommt der 5. Satz. </source> <target>here comes sentence number five. </target> <note>number below 10 are written as words.</note>

Metadata that can be contained <mq:warnings40> <mq:errorwarning mq:errorwarning-code="03062" mq:errorwarning-ignorable="errorwarningignorable" mq:errorwarning-shorttext="numbers in source and target segment do not match" mq:errorwarning-problemname="numbers do not match" mq:errorwarning-segmenthash="0" mq:errorwarning-combinedposstart="-1" mq:errorwarning-combinedposlength="0" /> </mq:warnings40>

This is a new <mrk mtype="x-sdl-comment" sdl:cid="dc02f347-9f59-486e-8f2e-1f97b2aa2c91">sentence</mrk>. <trans-unit id="855094c2-c334-4c6f-abf5-9fbad9e89d76"> <source>schicken Sie eine Mail an: <g id="pt1"><g id="pt2">info@firma.de</g></g></source> <seg-source><mrk mtype="seg" mid="6">schicken Sie eine Mail an:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7">info@firma.de</mrk></g></g></seg-source><target><mrk mtype="seg" mid="6">send a mail to:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7"><mrk mtype="x-sdl-comment" sdl:cid="af90adc0-a008-4484-a46a- 5824fddef1ea">info@firma.de</mrk></mrk></g></g></target><sdl:segdefs><sdl:seg id="6" conf="rejectedtranslation" origin="interactive"><sdl:value key="sdl:originaltranslationhash">- 2010148818</sdl:value></sdl:seg><sdl:seg id="7" conf="translated" origin="source"><sdl:value key="sdl:originaltranslationhash">- 669315889</sdl:value></sdl:seg></sdl:seg-defs></trans-unit>

Tests Translate a document Send document for review Compare documents with track changes Export to XLIFF

History of a file

Metadata in XLIFF with track changes File header <tool tool-id="mq" tool-name="memoq" tool-version="5.0.21" tool-company="kilgray" /> <mq:export-path>c:\temp\demo 1_ger.rtf</mq:export-path> <mq:docinformation mq:hashistory="true"> <mq:versioninfos mq:majorversion="1"><mq:minorversioninfo mq:minorversion="0" mq:comment="" mq:createdthroughview="false" mq:createreason="import" mq:creationtime="120498833" mq:creatoruser="azerfass" mq:tag=""><mq:details mq:type="minorversiondetailsimport"><![cdata[<?xml version="1.0"?> <MinorVersionDetailsImport xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xmlns:xsd="http://www.w3.org/2001/xmlschema"> <FilePath>C:\temp\Demo 1.rtf</FilePath> </MinorVersionDetailsImport>]]></mq:details></mq:minorversioninfo> <mq:minorversioninfo mq:minorversion="1" mq:comment="after manual translation" mq:createdthroughview="false" mq:createreason="snapshot" mq:creationtime="120498866" mq:creatoruser="azerfass" mq:tag=""><mq:details /></mq:minorversioninfo> <mq:minorversioninfo mq:minorversion="2" mq:comment="" mq:createdthroughview="false" mq:createreason="bilingualexport" mq:creationtime="120499070" mq:creatoruser="azerfass" mq:tag=""><mq:details mq:type="minorversiondetailsbilingexport"><![cdata[<?xml version="1.0"?> <MinorVersionDetailsBilingExport xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xmlns:xsd="http://www.w3.org/2001/xmlschema"> <BilingualType>Xliff</BilingualType> <TargetPath>C:\temp\compare current to version 1.xlf</TargetPath> <TwoColumpRtfProperties>MutipleDocuments</TwoColumpRtfProperties> <BilingRtfProperties>EmptySegmentsWithMarkup</BilingRtfProperties> <XliffProperties>IncludePreview IncludeSkeletons</XliffProperties> </MinorVersionDetailsBilingExport>]]></mq:details></mq:minorversioninfo> </mq:versioninfos> </mq:docinformation>

<trans-unit id="3" mq:minorversionend="2" mq:minorversionstart="2" mq:status="partiallyedited" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10-12t04:26:19z" mq:maxlengthchars="- 1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target>this is a short, beautiful sentence.</target> <mq:minorversions> <mq:historical-unit mq:minorversionend="1" mq:minorversionstart="1" mq:status="manuallyconfirmed" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d- 95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10-12t04:22:32z" mq:maxlengthchars="- 1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target>this is a short, nice sentence.</target> </mq:historical-unit> <mq:historical-unit mq:minorversionend="0" mq:minorversionstart="0" mq:status="notstarted" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2007-12-17t12:28:29z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target></target> </mq:historical-unit> </mq:minorversions></trans-unit>

Tests Create a translation with Segments in different states (translated, pretranslated, not edited, comments ) Create an XLIFF exchange file Import XLIFF into another tool What metadata can be re-used between tools? Unfortunately.None, at least in the tool combinations tested for this presentation...

Future vision... Statistics for BI reporting Where segments came from How much was used, changed, rejected Which user changes a lot or just accepts TUs as-is Rollback What percentage of TM is used over time Segment usage counter Change history QA messaging User input / feedback commentary into bug tracking