Metadata in Translation Tools: Importance, Usage, Storage, Transfer Angelika Zerfaß & Richard Sikes
Metadata is... Data that describes other data. It provides information about a certain item's content.
Some Examples from everyday life Book Author Genre Subject Length When written Summary Location Key words Graphic File: How large Color depth Image resolution When created File name Description Key words
Data vs Metadata
Metadata in Localization What kind of metadata can be provided throughout the localization process? Where and how can it be used? How well does it transfer between tools?
Metadata in Translation Tools Information about the translation itself Creation date User who created the translation User who created the translation databases TMs, Term bases... status information about the file in translation (is the translation confirmed, does it come from a TM, automated translation, Alignment
Uses Metadata within one TM system: Categorize translated content within one translation memory system Influence the match rates during translation.
Transfer between Tools Different translation tool components use different exchange formats. Translation memories are exchanged via TMX (Translation Memory Exchange format). Terminology data is exchanged via CSV or TBX (Term Base Exchange format). While the segment pairs of source segment plus translated segment can be exchanged via XLIFF or a customized version thereof.
Where does Metadata Reside? In the header of exchange file formats like TMX, XLIFF, TBX Administrative data user defined data (name of TM, path where XLIFF was saved to ) On the translation unit/term level of the exchange files Administrative data (who and when) Process data (coming from alignment ) categorization data (pre-defined and customer fields) Inside segments (formatting )
More detailed used for Metadata Searching for segments in a TM Searching for terms in a term base Filtering for segments during translation (prefer / penalize, i.e. decrease match rate) TM clean-up TM splitting (filter during export) Term base splitting (filter during export)
Details Metadata in TMs TM level data Administrative data (name, who, when) Pre-defined categories (subject, client ) if the tool so provides Document level data with corpus based tools TU level data Administrative (created when and by who) Process data (TU comes from alignment ) Custom categories
Our Tests Create TM with custom metadata fields Add translations to TM with field information Export to TMX Import into different tool What metadata transfers well, what doesn t
Metadata when creating a TM (1) select/fill predefined fields
Metadata when creating a TM (2) custom fields
Metadata in the TM associated with a segment
Information on the TM level <?xml version="1.0" encoding="utf-16"?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtool="memoq" creationtoolversion="5.0.21" segtype="sentence" adminlang="en-us" creationid="azerfass" srclang="en" o-tmf="memoqtm" datatype="unknown"> <prop type="defclient">client A</prop> <prop type="defproject">proj 123</prop> <prop type="defdomain">automotive</prop> <prop type="defsubject">transmission</prop> <prop type="description">models 1-5b</prop> <prop type="targetlang">de</prop> <prop type="name">full Settings</prop> </header>
Information on the segment level <tu changedate="20111012t003107z" creationdate="20111012t003107z" creationid="azerfass" changeid="azerfass"> <prop type="client">autoparts</prop> <prop type="project">1-2-3-4-5</prop> <prop type="domain">automotive-aeronautics</prop> <prop type="subject">spare parts</prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <prop type="x-document">demo 1</prop> <prop type="x-reviewer">rev1</prop> <prop type="x-internal id">44567</prop> <prop type="x-date of review">20111012t003000z</prop> <prop type="x-doc type">broschure</prop> <prop type="x-model">model 1</prop> <prop type="x-model">model 5</prop> <tuv xml:lang="en"> <prop type="x-context-pre"><seg>dies ist ein neuer Satz.</seg></prop> <prop type="x-context-post"><seg>dies ist ein kurzer wunderschöner Satz.</seg></prop> <seg>dies ist ein kurzer neuer Satz.</seg> </tuv> <tuv xml:lang="de"> <seg>this is a short, new sentence.</seg> </tuv> </tu>
Metadata used for sorting in the TM
Metadata when creating a TM (1) select/fill predefined fields
Metadata when creating a TM (2) custom fields
Metadata in the TM associated with a segment
Information on the TM level <?xml version="1.0" encoding="utf-8"?> <tmx version="1.4"> <header creationtool="sdl Language Platform" creationtoolversion="8.0" o-tmf="sdl TM8 Format" datatype="xml" segtype="sentence" adminlang="de-de" srclang="de-de" creationdate="20111012t011627z" creationid="z- 0314F13C5AED4\A"> <prop type="x-reviewer:singlestring"></prop> <prop type="x-doc type:singlepicklist">legal,workshop manual,website,broschure</prop> <prop type="x-model:singlepicklist">model 1,model 2,model 3,model 4,model 5</prop> <prop type="x-internal id:integer"></prop> <prop type="x-review date:datetime"></prop> <prop type="x-recognizers">recognizeall</prop> <prop type="x-tmname">tm for TMX test</prop> </header>
Information on the segment level <tu creationdate="20111012t032948z" creationid="align!" changedate="20111012t032948z" changeid="align!" lastusagedate="20111012t013621z" usagecount="2"> <prop type="x-context">-8428286702482475836, 1404007344699555312</prop> <prop type="x-context">615444784753120163, 615444784753120163</prop> <prop type="x-origin">alignment</prop> <prop type="x-originalformat">tradostranslatorsworkbench</prop> or <prop type="x-origin">tm</prop> <prop type="x-confirmationlevel">translated</prop> <prop type="x-review date:datetime">20100303t120000z</prop> <prop type="x-reviewer:singlestring">az</prop> <prop type="x-internal id:integer">12345</prop> <prop type="x-doc type:singlepicklist">workshop manual</prop> <prop type="x-model:singlepicklist">model 4</prop> <tuv xml:lang="de-de"> <seg>dies ist ein neuer Satz.</seg> </tuv> <tuv xml:lang="en-us"> <seg>this is a new sentence.</seg> </tuv> </tu>
Metadata used for sorting in the TM
Metadata on document level
Information on the document level <?xml version="1.0" encoding="utf-16"?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtool="multitrans" creationtoolversion="5.0.1947.0" segtype="sentence" o-tmf="dvmdb" adminlang="en-us" srclang="en-us" datatype="html" creationdate="20111011t220958z" creationid="richard" changedate="20111011t223931z" changeid="richard"><prop type="txb:name">locworld HTML Example</prop> <prop type="doc:created date">20111011t094600z</prop> <prop type="doc:modified date">20111011t095000z</prop> <prop type="doc:name">html Example.htm</prop> <prop type="doc:source language">eng</prop> <prop type="doc:revision Number">0</prop> <prop type="doc:revision Date">20111011T000000Z</prop> <prop type="doc:created date">20111011t094600z</prop> <prop type="doc:modified date">20111011t095000z</prop> <prop type="doc:name">html Beispiel.htm</prop> <prop type="doc:source language">eng</prop> <prop type="doc:revision Number">0</prop> <prop type="doc:revision Date">20111011T000000Z</prop> </header>
Metadata on document level
Metadata on document level Take TMX from Trados 2009 to memoq 5 Add all unknown fields to the setup
Tests Take TMX from Trados 2009 to memoq 5
Tests Take TMX from memoq and MultiTrans into SDL Trados 2009, different use for picklist and text field values memoq <prop type="x-model">model 1</prop> <prop type="project">1-2-3-4-5</prop> MultiTrans <prop type="document Type">Manual</prop> SDL Trados 2009 <prop type="x-doc type:singlepicklist">workshop manual</prop> <prop type="xreviewer:singlestring">az</prop>
Tests Mapping of fields from a TMX file to existing fields in MultiTrans
Tests
Tests Create a TM with translations from one specific file format like, HTML, DOC, InDesign, XML Export to TMX and import into another tool Run a translation of the exact same file used to create the first TM Match rates differ greatly depending on file format used to create the TM Because different segmentation rules are applied Because inline tags are not recognized or interpreted differently or were not imported at all during TMX exchange
TMX exchange results
Details Metadata in XLIFF File level data Administrative data (name, who, when) TU level data Administrative (created when and by who) Process data (status of translation, origin of match) Comments, history
What an XLIFF contains <?xml version="1.0" encoding="utf-8"?> <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:mq="mqxliff" xmlns:xsi="http://www.w3.org/2001/xmlschemainstance" xsi:schemalocation="urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-transitional.xsd"> <file original="c:\users\azerfass.zaac\desktop\tmx Test\Beispieldateien\HTML Beispiel.htm" mq:id="e3b904d9-7967-46bc- 88e3-1bdd83d25544" source-language="de" target-language="en" datatype="x-html"> <header> <skl> <internalfile>uesdbbqaaaaiajso8j5axl8ttaiaadoiaaaraaaasfrntcbczwlzc GllbC54bWytVVFv2jAQfp+0/4B4B0OgLVRuKgbbigQdK2yrJqTJxJdgNbEj2y m0v37gctqjqxtsxic++777znf2gd9uo7d2dfixww/q7warxgpuccp4cf NPtN/o1W/..
What an XLIFF file can contain on the translation unit level <body> <trans-unit id="1" mq:status="manuallyconfirmed" mq:rep="rep" mq:segmentguid="1f447f04-f394-43e6-aacc-355a1dabed92" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-07-18t15:47:25z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="1" mq:hasfollowingobject="hasfollowingobject">beispielseite</source> <target>sample page</target> </trans-unit>
What an XLIFF file can contain on the translation unit level <trans-unit id="5" mq:minorversionend="5" mq:minorversionstart="4" mq:status="partiallyedited" mq:segmentguid="5ecd83ee-3bd8-4a86-9a96- ed10485254bc" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01- 01T00:00:00Z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10- 11T15:52:05Z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="6">hier kommt der 5. Satz. </source> <target>here comes sentence number five. </target> <note>number below 10 are written as words.</note>
Metadata that can be contained <mq:warnings40> <mq:errorwarning mq:errorwarning-code="03062" mq:errorwarning-ignorable="errorwarningignorable" mq:errorwarning-shorttext="numbers in source and target segment do not match" mq:errorwarning-problemname="numbers do not match" mq:errorwarning-segmenthash="0" mq:errorwarning-combinedposstart="-1" mq:errorwarning-combinedposlength="0" /> </mq:warnings40>
This is a new <mrk mtype="x-sdl-comment" sdl:cid="dc02f347-9f59-486e-8f2e-1f97b2aa2c91">sentence</mrk>. <trans-unit id="855094c2-c334-4c6f-abf5-9fbad9e89d76"> <source>schicken Sie eine Mail an: <g id="pt1"><g id="pt2">info@firma.de</g></g></source> <seg-source><mrk mtype="seg" mid="6">schicken Sie eine Mail an:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7">info@firma.de</mrk></g></g></seg-source><target><mrk mtype="seg" mid="6">send a mail to:</mrk> <g id="pt1"><g id="pt2"><mrk mtype="seg" mid="7"><mrk mtype="x-sdl-comment" sdl:cid="af90adc0-a008-4484-a46a- 5824fddef1ea">info@firma.de</mrk></mrk></g></g></target><sdl:segdefs><sdl:seg id="6" conf="rejectedtranslation" origin="interactive"><sdl:value key="sdl:originaltranslationhash">- 2010148818</sdl:value></sdl:seg><sdl:seg id="7" conf="translated" origin="source"><sdl:value key="sdl:originaltranslationhash">- 669315889</sdl:value></sdl:seg></sdl:seg-defs></trans-unit>
Tests Translate a document Send document for review Compare documents with track changes Export to XLIFF
History of a file
Metadata in XLIFF with track changes File header <tool tool-id="mq" tool-name="memoq" tool-version="5.0.21" tool-company="kilgray" /> <mq:export-path>c:\temp\demo 1_ger.rtf</mq:export-path> <mq:docinformation mq:hashistory="true"> <mq:versioninfos mq:majorversion="1"><mq:minorversioninfo mq:minorversion="0" mq:comment="" mq:createdthroughview="false" mq:createreason="import" mq:creationtime="120498833" mq:creatoruser="azerfass" mq:tag=""><mq:details mq:type="minorversiondetailsimport"><![cdata[<?xml version="1.0"?> <MinorVersionDetailsImport xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xmlns:xsd="http://www.w3.org/2001/xmlschema"> <FilePath>C:\temp\Demo 1.rtf</FilePath> </MinorVersionDetailsImport>]]></mq:details></mq:minorversioninfo> <mq:minorversioninfo mq:minorversion="1" mq:comment="after manual translation" mq:createdthroughview="false" mq:createreason="snapshot" mq:creationtime="120498866" mq:creatoruser="azerfass" mq:tag=""><mq:details /></mq:minorversioninfo> <mq:minorversioninfo mq:minorversion="2" mq:comment="" mq:createdthroughview="false" mq:createreason="bilingualexport" mq:creationtime="120499070" mq:creatoruser="azerfass" mq:tag=""><mq:details mq:type="minorversiondetailsbilingexport"><![cdata[<?xml version="1.0"?> <MinorVersionDetailsBilingExport xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xmlns:xsd="http://www.w3.org/2001/xmlschema"> <BilingualType>Xliff</BilingualType> <TargetPath>C:\temp\compare current to version 1.xlf</TargetPath> <TwoColumpRtfProperties>MutipleDocuments</TwoColumpRtfProperties> <BilingRtfProperties>EmptySegmentsWithMarkup</BilingRtfProperties> <XliffProperties>IncludePreview IncludeSkeletons</XliffProperties> </MinorVersionDetailsBilingExport>]]></mq:details></mq:minorversioninfo> </mq:versioninfos> </mq:docinformation>
<trans-unit id="3" mq:minorversionend="2" mq:minorversionstart="2" mq:status="partiallyedited" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10-12t04:26:19z" mq:maxlengthchars="- 1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target>this is a short, beautiful sentence.</target> <mq:minorversions> <mq:historical-unit mq:minorversionend="1" mq:minorversionstart="1" mq:status="manuallyconfirmed" mq:percent="101" mq:segmentguid="db0d2d51-8345-4c6d- 95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2011-10-12t04:22:32z" mq:maxlengthchars="- 1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target>this is a short, nice sentence.</target> </mq:historical-unit> <mq:historical-unit mq:minorversionend="0" mq:minorversionstart="0" mq:status="notstarted" mq:segmentguid="db0d2d51-8345-4c6d-95e5-6beff647550b" mq:translatorcommittimestamp="0001-01-01t00:00:00z" mq:reviewer1committimestamp="0001-01-01t00:00:00z" mq:reviewer2committimestamp="0001-01-01t00:00:00z" mq:lastchangedtimestamp="2007-12-17t12:28:29z" mq:maxlengthchars="-1" mq:nosplitjoin="false"> <source mq:segpart="3" mq:hasfollowingobject="hasfollowingobject">dies ist ein kurzer schöner Satz.</source> <target></target> </mq:historical-unit> </mq:minorversions></trans-unit>
Tests Create a translation with Segments in different states (translated, pretranslated, not edited, comments ) Create an XLIFF exchange file Import XLIFF into another tool What metadata can be re-used between tools? Unfortunately.None, at least in the tool combinations tested for this presentation...
Future vision... Statistics for BI reporting Where segments came from How much was used, changed, rejected Which user changes a lot or just accepts TUs as-is Rollback What percentage of TM is used over time Segment usage counter Change history QA messaging User input / feedback commentary into bug tracking