XLIFF Localisation for Joomla! Translator-oriented localisation of CMS-based websites Jesús Torres del Rey Emilio Rodríguez Vázquez de Aldana Faculty of Translation and Documentation http://diarium.usal.es/codex
Agenda Introduction Motivation Multilingual management & interchange Our Research/Experiments Analysis of other tools Application Workflow XLIFF 1.2, XML+its1.0 Behaviour in CAT tools Translation-Oriented L10n Future Work Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 1
Motivation: chronology 2009: Request for translation of Faculty s website (Joomla 1.5, multilingual Joomfish) Html download > use of CAT > paste on Joomla html editor 2010-11: How to teach localisation of dynamic websites to our UG students? Full localisation of static websites taught Filetypes and technologies (html, js, css, graphics ) Super-, Macro-, Hyper-, Micro- structures Directory structures, relative links Link/Web management (Ms Expression, Adobe DW ) Automatisation via Search/Replace, regular expressions 2012: Multilingual extensions for Joomla 2.5 Falang (also for Joomla 3), Josetta, Joomfish, Jolomea 2013: Research with other CMSs (Drupal, Wrpss., Ty3.) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 2
Motivation: T&R philosophy Translator/Localiser-oriented approach Integration with CAT/Localisation tools Empowerment through control of Processes, lifecycle From request to publication, update, multilingualisation Visual/Relational/Functional Context, Global meaning, Negotiation of communication needs Standardisation, XLIFF, ITS Acquisition of basic knowledge of Nature and Mechanics of Dynamic, CMS-based websites (On top of nature and mechanics of static websites) Filetypes, Databases and technologies Server Client intrastructure Composition of Dynamic active pages Front-end, Back-end, interface, content Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 3
Motivation: (dis)empowerment Static html-based websites: Full localisation Visual and functional context Use of functionality, quality tools (CAT/L) Capable of multilingual re-structuring Publication-ready deliverables Dynamic CMS-based websites: Patchy translation CMS partial webpage/separate translation environment Texts locked in DB-> export/import (for interchange, batch quality/analysis/term extraction processes) Only if administrative rights for CMS and multilanguage module installed Only if write-access rights; partial, patchy publication Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 4
L10n, I18n, Multilanguage: evolution in CMSs Specific (often) third-party modules to make multilingual websites easy to setup and manage. Automatic duplication of structure/pages Taking advantage of simplified CMS editing environments At the same time, translatable data export/import modules to csv, po, xml and, increasingly, XLIFF» Drupal XLIFF Tools, Wordpress WPML, Typo3 l10nmgr, Joomla JDiction (since early 2013)... Combination of multilingual management and XLIFF et al. export/import» Wordpress WPML, Joomla JDiction (since early 2013)... Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 5
Our experiments: overview Application: Falang2XLIFF (beta)» http://diarium.usal.es/codex/desarrollo Java Client (compiled to 1.7) Handy experimental tool with our limited resources Not embedded into CMS as a module: access rights to DB? Uses Falang multilingual DB structure for Joomla Potentially applicable to other DB structures, like Josetta, Jfish Main purpose: to experiment with data to be extracted, XLIFF and whole L10n process, and to use it for our UG L10n course for translator training Other tools: Jdiction (xliff tool added since March) For other CMSs: XLIFF Tools (Drupal) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 6
Experiments: L10n objects CMS objects for L10n: Editor/Administration interface php, asp, or externalised to ini, po Dependent, linked files (pdf, epub, graphics, video, audio ) Database elements Article/page Modules (e.g. calendar ) Categories (e.g. for thematically grouping blog posts). Smaller user interaction elements (weblinks, etc.) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 7
Experiments: L10n objects In Database tables, L10n elements: 1. Structural/Interface text strings menus, article titles, sections 2. Longer (x)html article contents 3. Parameters for the above elements metakey, metadesc, menu params. All in text fields in DB* Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 8
Experiments: other extraction strategies (JDiction) Titles <!CDATA[ TEXT]]> HTML:: TAG & TEXT <!CDATA[ TEXT]]> Parameters: state->translated! (Drupal: final status) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 9
Experiments: other extraction>cat (JDiction>Virtaal) Tags are visually marked probably, regex <[^>]+/?> However, unprotected tags CAT tools could integrate a WYSWYG html editor if xliff 1.2 datatype = "htmlbody" Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 10
Experiments: other extraction>cat (Jdiction>MemoQ 5) Filters not always versatile enough Segments should be shorter and regularly segmented for better matches and TM leverage Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 11
Experiments: Summarising JDiction Multilingual management + export/import Some multilingual management problems: Translation editor: separate environment (not integrated in target -language page) does not show original in parallel Some export/import problems: Indiscriminate bulk export, irrespective of newness or update/translated state CDATA export of (x)html content» No different from csv export» Whole article/item, without structure XHTML should be processed with XML processors, rather than with regular expressions HTML text should be carried to CAT tool not as plain text but as html tags and text (Drupal Xliff Tools does) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 12
Experiments: Application Workflow (export) Joomla! with Falang Falang2Xliff 1. In Falang, element selection 2. DB Connection 4. XML Generation XML+its1.0 BD 3. DB Extraction (new & updated) Simple XML (Temporary) 5. XLIFF Generation xml2xliff.xsl of XliffRoundTrip Tool XLIFF 1.2 Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 13
Application: Workflow (1/6) 1. In Falang: Element Selection 1.1. Falang. PM with admin rights 1.2 selects elements one by one! and 1.3 Copy Source! Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 14
Application: Workflow (2/6) 2. Database Connection Only standard TCP/IP connections to SQL server Only in network security zone or localhost Joomla DB prefix needed Read-access permission for export Falang tables but also Original content tables, to check newness & update status Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 15
Application: Workflow (3/6) 3. Database Extraction (new & updated) New: Established as translatable by PM by using "Copy Source Updated: translatable text whose source content has been edited (original content tables checked MD5 hash-) Info from attributes title, text, introtext, name, fulltext, description & content in tables categories, content, menu, modules and weblinks Parameters not extracted to prevent DB corruption. X X Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 16
Application: Workflow (3/6) 3. Database Extraction (new & updated): The Joomla! html editor typically rewrites HTML fragments as XHTML But are we certain that it is correct XHTML? We have rechecked (Jericho Parser HTML) and rewritten data if necessary XML entitities, closing attribute quotes, checking and correcting node hyerarchy» Some current limitations: e.g. unpaired <tag> <tag/> XHTML elements should be stored in DB as XMLElements» ISO/IEC 9075-14:2011-Part 14:XML-Related Specifications (SQL/XML)» XML Support low in MySQL; high in PostgreSQL Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 17
Application: Workflow (4/6) 4. XML Generation <value_falang>usando Joomla! & </value_falang> <value_falang><p> <img /> </p></value_falang> Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 18
Application: Workflow (4/6) 4. XML Generation (temporary file to be converted to XLIFF) <registros_falang> Root <registro_falang> <value_falang> Attributes contain info for correct back import to DB Contains translatable content (can include html elements) XHTML Text Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 19
Application: Workflow (4/6) 4. Generation of XML+its1.0 Global, Embedded ITS rules. Features: Translate Elements Within Text ITS1.0 supports XPath 1.0 (which does not support regex) W3C WG (2008): Best Practices por XML Localization. 5.1.4 Associating existing XHTML Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana markup with ITS 20
Application: Workflow (5/6) 5. Generation of XLIFF 1.2 Schnabel s xml2xliff.xsl adapted so that source language=variable Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 21
Application: Workflow Generation of XML+its1.0 and XLIFF Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 22
Application: Workflow (import) From XLIFF/XML+its back to DB (import) Joomla! with Falang Falang2Xliff BD XML (Temporary) Optional online update 1. XML Generation 2. SQL Generation XLIFF 1.2 xliff2xml.xsl (XliffRoundTrip) SQL XML + ITS 1.0 XLIFF encoding (UTF-8 without BOM) Translation states (e.g. needs-translation, etc.) not taken into account XML to SQL via Xquery processor (http://xmlbeans.apache.org/index.html) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 23
Application: Workflow (import) From XLIFF/XML+its back to DB (import) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 24
Application: XLIFF generated 1 2 3 xliffroundtrip XSL For regular XML structures Limitation: Attributes (translatable) must be post-processed Tags: <group> (without text) <trans-unit> (with text) <g> </g>, <x/> (within text/inline) <p><img alt=" " </p> 1 <p><span> <a title =" " > </a> </span></p> 2 <ul><li><span> <strong> </strong> </span></li> 3 <li><span> <strong> </strong> <em> </em> </span></li></ul> 4 4 <trans-unit><x/> </trans-unit> <group><trans-unit> <g id="" > </g> </trans-unit></group> <group> <group><trans-unit> <g id=""> </g> </trans-unit></group> </group> <group><trans-unit> <g id=""> </g> <g d=""> </g> </transunit></group> 25 1 2 3 4
Experiments: XLIFF>CAT Translation Units segmented at paragraph level Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 26
Experiments: XML+its1.0>CAT 2 1 4,5 3 6 Support for ITS in CAT? SDL Trados Studio: Global & Embedded rules for features: Translate Elements Within 7 Okapi Rainbow (For Global:) Translate, Elements WithinText, LocNote XTM text (Linked File) 27
Experiments: html overtagging > XLIFF <ul> <li><span> <strong> </strong> </span></li> 3 <li><span> <strong> </strong> <em> </em> </span> </li> </ul> 4 3 4 Many reformatting actions (on the html editor) produce html overtagging <ul> <li><span> <strong> </strong> </span></li> 3 <li><span style=""> </span><strong style=""> </strong><span style=""> </span> <em style=""> </em><span style=""> </span> </li> </ul> Previous Segment 4 becomes 4, 5, 6, 7, 8 Therefore, one trans-unit for each <tag></tag> pair 28
Experiments: html overtagging > XLIFF > CAT 3 4 8 6 4 5 7 Html overtagging by CMS html editors produces oversegmentation when converting to XLIFF (following XSL s logical segmentation strategy) CMS editors Clean-html function seldom helps! Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 29
Experiments: html overtagging > ITS > XLIFF > CAT Okapi Rainbow-generated XLIFF from XML+its 1.0 XML+its 1.0 converted to SDLXLIFF by CAT tool Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 30
Translator/Localiser needs CMS Communication Structure Agent/Doc/Kn Interaction Global Meaning & Function Intratextual relations Purpose Exchange PM Translator/Localiser Form, layout, expression CAT/L Quality, Consistence, Adherence to conventions, leverage, format, language/knowledge building Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 31
Translator/Localiser needs Meaningful, (dynamically) coherent whole that needs to attract, keep & direct attention Translation as just a matter of words, just a language problem?! Localisation/Translation as adaptation, communication, cultural/professional mediation Articles/Items are coherently, cohesively integrated in General/Particular communicative/performative purpose Sometimes bigger articles Regions in the webpage, & relative positions Hyperlink/Interaction relationships Structure/sitemap relationships (internal and external menus, etc.) Potentially indexed search results Type of article/element/module categories Usability/Accessibility needs/alternatives Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 32
Translator/Localiser needs CMS<xliff/its>CAT/L TOOLS Exported units must behave properly and efficiently in CAT/L tools Segmentation XHTML structure, function, meaning of tags Preview? Visual/functional contextualisation Link to published webpage, highlighted translated elements Zielinski & Beuster (memoqfest 2012): DB>html>CATpreview Control of new elements, updates, trans status, etc. Interchange (batch extraction, revision, etc.) Other Possibility of placeable adaptation? E.g. specific/global localisable links (href attribute) Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 33
Translator/Localiser desiderata CMS4L10n CMS managing content taking translators/localisers/pms into account Separating content from layout & function but showing interrelationships XLIFF with linked XSL/CSS? (in xliff 2.0 L10n kit/portfolio?) Preview, link to published page? Classifying elements in a standard way, semantics? Types of articles/pages Types of modules Relations between constituents Possibility of PM preprocessing for translation» CMS User profiles: localisation PM, localiser E.g. specific/global localisable links (href attribute) Including various articles, entities, elements (e.g. flash, graphics, etc.) of a page in an XLIFF file/group element, marking which for translation, others translated/for context Generating html skeleton? Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 34
Future work In-depth analysis of export/import tools in different CMSs and other Joomla! Multilingual Managers. Josetta, new Joomfish version Extraction of contextual, preview information Links to published page containing translatable articles Analysis of object types & relationships in web CMSs + Accessibility needs Jesús Torres del Rey & Emilio Rodríguez Vázquez de Aldana 35