Localizing dynamic websites created from open source content management systems memoqfest 2012, May 10, 2012, Budapest Daniel Zielinski Martin Beuster Loctimize GmbH [daniel martin]@loctimize.com www.loctimize.com
Agenda Open source content management systems The localization challenges General localization strategies Conclusions 2
Open source content management systems 3
Challenges Identify content Extract content Publish localized website Fix bugs Create / update content Prepare content Translate content Test localization Integrate translated content 4
Identify content - Database Most of the content is stored in databases Databases are made up of related tables The tables are made up of rows and columns The fields contain the content in different formats (Text, HTML, XML, proprietary format) and Metadata used for identifying/filtering the relevant content published = 1 deleted = 1 translate = 0 language = 2 5
Identify content - Database Text HTML? content content 6
Identify content - File system Template files (HTML, CSS, JPG, PNG, GIF) Configuration files (INI, PHP, PROPERTIES, TXT ) Localization files (XLIFF, XML, PHP ) User files (PDF, DOC, XLS, PPT, ) 7
Identify content - Template files (HTML) Translatable content? 8
Identify content - Configuration files INI Files Some of the content is stored in INI files. It is stored in key-value pairs. Keys = Values 9
Identify content - Configuration files PHP Files Some of the content is stored in PHP files It is stored in key-value pairs or arrays 10
Identify content - Localization files - XML UI strings Language groups IDs 11
Extract content Database Manually by copying Available extensions that understand the I18N/L10N logic of the CMS that extract and export into a translatable exchange format Develop scripts and exchange formats to extract and export the content into a translatable exchange format
Extract content Database Joomla! Joom!Fish Plus, Jolomea (XML, XLIFF, PO) TYPO3 Localization Manager (XML) Drupal i18n, Translation Management, (XML, XLIFF) Wordpress Easy Translator Pro (PO) Wordpress WPML (XLIFF)
Extract content Database Meta data Source URL Page content 14
Extract content Database IDs 15
Extract content Files Copy files Know the file structure of the CMS FTP access Access to CMS backend with appropriate rights
Automate workflow? Use content connector and/or API to pass on the localisable content to memoq. 17
Prepare files Defining non-translatable content Add additional tags Defining filter settings XML filter HTML filter RegEx text filter Cascading filters RegEx tagger Joining files
Translate content Lack of context Translation of content deltas (updates) Translation without visual information (XML, INI) Placeholders like %1, $2, {1}, $VAR, \n, \t
Translate content - HTML HTML files are added to memoq using the standard filter. Tags and attributes can not be configured (localized hyperlinks). A preview is available to translators and revisers. 20
Translate content - HTML Editor Lookup results Preview 21
Translate content XML files Add the XML files to memoq using a pre-defined XML filter (and a cascading HTML/RegEx text filter). Content is grouped by page Source URL in comments field
Translate content - XML Editor Lookup results Source URL Preview 23
Translate content INI files Add the INI files to memoq using a Regex text filter and a cascading HTML filter. The Regex text filter defines paragraphs as ([^=]*=)(.+) with content group 2. 24
Post-processing translated content Convert to SQL Using a script the HTML files are converted to SQL files. The IDs extracted from the tags in the HTML are used to update the correct rows. 25
Integrate localised content Manually by copying & pasting Available extensions that understand the I18N/L10N logic of the CMS that import the localized content Develop scripts to import the localized content
Importing localised content - Database Preview links with login information Overwrite mode 27
Importing localised content - Database The translated SQL file is imported into the database. The table rows are updated with the translated content along with other settings. Original text Modified date Published flag Hashed value
Importing localised content INI files Translated INI files are exported from memoq These are stored in the appropriate folders on the web server 29
Automate workflow? Watch export folders and use CMS API/script to import localized files
Test localised content Find the localised content in the website (frontend) Proof-read Layout check 31
Fix localization bugs Find where content to be updated came from Update content in CMS Update bilingual files and/or translation memory Modify stylesheets (CSS) 32
Publish localized page Update! 33
Conclusion Complex processes Interaction of a lot of people No standard procedures Need to develop processes and tools Risk of loosing/missing data when trying to mimic CMS core functionality 34
Conclusion Translation Service Provider Time consuming Scoping is a non trivial first step Expertise in CMS, web technology, databases Develop tools Educate client and web developers Sponsor development! Client Choose CMS wisely I18N & L10N strategy Expect additional costs for localisation engineering and/or development Time consuming! 35
Thank you very much for your attention!