Cat4Trad. An XML editor to support the legislative drafting process

Similar documents
Akoma Ntoso in the EU Parliament Amendment Process

Question template for interviews

The Principle of Translation Management Systems

Translation and Localization Services

XTM for Language Service Providers Explained

PDF Primer PDF. White Paper

Prof. Dr. Klemens Waldhör Chief Architect

XTM Cloud Explained. XTM Cloud Explained. Better Translation Technology. Page 1

Translating for a Multilingual European Union: Putting Multilingualism into Context Dr Angeliki PETRITS Language Officer European Commission, UK

Working with MateCat User manual and installation guide

TRANSLATIONS FOR A WORKING WORLD. 2. Translate files in their source format. 1. Localize thoroughly

"Better is the enemy of good." Tips for Translators Who Migrate to Across

Introduction to OpenTM2 An Open Source Solution for Translators

The Legal Service of the European Commission. March

Translation Proxy A New Option for Managing Multilingual Websites

Lingotek + Sitecore Finally. Networked translation inside Sitecore.

SOFTWARE LOCALIZATION FOR AGILE, WATERFALL, AND HYBRID DEVELOPMENT

Machine Translation at the European Commission

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

Tanja Wissik. COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm

Collecting Polish German Parallel Corpora in the Internet

Challenges of Automation in Translation Quality Management

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

Localizing dynamic websites created from open source content management systems

Council of the European Union Brussels, 12 September 2014 (OR. en)

Getting Off to a Good Start: Best Practices for Terminology

Lingotek + Oracle Eloqua

The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

Lingotek + Salesforce

TEPZZ 6_Z76 A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (51) Int Cl.:

Guideline for Implementing the Universal Data Element Framework (UDEF)

How can public institutions benefit from automated translation infrastructure?

MM, EFES EN. Marc Mathieu

Candle Plant process automation based on ABB 800xA Distributed Control Systems

FreeForm Designer. Phone: Fax: POB 8792, Natanya, Israel Document2

Council of the European Union Brussels, 5 March 2015 (OR. en)

EMC Documentum Composer

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

Official Journal of the European Union

Smart Connection 9 Element Labels

MODULE 7: TECHNOLOGY OVERVIEW. Module Overview. Objectives

MadCap Software. SharePoint Guide. Flare 11.1

StreamServe Persuasion SP5 Ad Hoc Correspondence and Correspondence Reviewer

Adobe Conversion Settings in Word. Section 508: Why comply?

ChangeTracker Quick Start Guide

Content Management & Translation Management

Transit NXT. Ergonomic design New functions Process-optimised user interface. STAR Group your single-source partner for information services & tools

User Guide. Chapter 1. SitePublish: Content Management System

Translators Handbook

Building Views and Charts in Requests Introduction to Answers views and charts Creating and editing charts Performing common view tasks

The Document Review Process: Automation of your document review and approval. A White Paper. BP Logix, Inc.

After you complete the survey, compare what you saw on the survey to the actual questions listed below:

Microsoft Office 2013

Terms and Definitions for CMS Administrators, Architects, and Developers

DocumentsCorePack for MS CRM 2011 Implementation Guide

Using Workflows in a Content Management System

Structured Authoring: A First Step to Content Management

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

MEMSOURCE QUICK GUIDE FOR PROJECT MANAGERS

ORACLE BUSINESS INTELLIGENCE WORKSHOP

Microsoft Word: Moodle Quiz Template

SYSTRAN v6 Quick Start Guide

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

OWrite One of the more interesting features Manipulating documents Documents can be printed OWrite has the look and feel Find and replace

The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols

EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION. (43) Date of publication: Bulletin 2011/37

TIME AND ATTENDANCE MANAGEMENT WEB

System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria

Web Forms for Marketers 2.3 for Sitecore CMS 6.5 and

Section 1 Spreadsheet Design

Drupal Training Guide

Online Packaging Management Solution

SEVENTH FRAMEWORK PROGRAMME THEME ICT Digital libraries and technology-enhanced learning

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Vit2Print.com nv. WYSIWYG all the way

TECHNICAL SPECIFICATION for renewing the Content Management System, CMS for the web page web

Start Learning Joomla!

Digital Marketing EasyEditor Guide Dynamic

Structured Content: the Key to Agile. Web Experience Management. Introduction

OpenOffice.org Writer

2Creating Reports: Basic Techniques. Chapter

Using the JNIOR with the GDC Digital Cinema Server. Last Updated November 30, 2012

Toad for Data Analysts, Tips n Tricks

Oracle BI 11g R1: Build Repositories

Formatting Custom List Information

OpenIMS 4.2. Document Management Server. User manual

How To Manage A Multi Site In Drupal

Transcription:

Cat4Trad An XML editor to support the legislative drafting process 4 May 2012 3-4 May 2012 1

Aims of the Workshop End users case To explore the perception of direct and indirect benefits provided by such implementations To prove that introducing XML-based production chains into the lawmaking process can bring important benefits that justify the initial investment and positively impact on the quality of the process. 3-4 May 2012 2

Introduction With a special set of needs related to our commitment to multilingualism, our role in the European legislative process and the new e-parliament programme, the European Parliament Translation Service required a CAT Tool to: Treat multilingual documents without having to refresh the Translation Memory at each change of the source language. Support over-the-shoulder translation, whereby there is no need to await the finalisation of the official relay language version, any other language version being made available even at a non-finalised stage. Support an almost WYSIWYG display whenever it may help identify the parts of text that need special attention, for instance amendments in two-column layout. Integrate in a single tool the reusable content from different sources, in our case Euramis (interinstitutional multilingual database), Transfils (standard text source), and Statistical Machine Translation. Replace existing source text translations by reference rather than matching. Follow an XML schema specially adapted for our needs (Akoma Ntoso). 3-4 May 2012 3

Multilingualism Complications Consequence of multilingualism A translator cannot be expected to cover all source languages Of the languages he does know, a translator will know some better than others Complicated workflow for translation Document source language may change during its life cycle Documents can have multiple source languages Document may be drafted in a compromise source language (requiring mother-tongue editing before translation) Waiting for the relay language version shortens the time available for translation 3-4 May 2012 4

Document Flow A Maze EP Commission proposal Committee A (draft report) Committee B (draft opinion) Committee C (draft opinion) Amendments Amendments Amendments Final opinion Final opinion Final report Amendments to the Plenary Adopted text 3-4 May 2012 5

Document Flow With a Jungle Source = IT EP Commission proposal 22 LL Committee A (draft report) Source = EN Committee B (draft opinion) Committee C (draft opinion) Source = DE Amendments Source = BG, DE, EN, FI, IT, SK Source = DE, EN, FR, HU Amendments Amendments Source= DA, EN, EL, NL, PT Source = EN Final opinion Final opinion Source = FR Source= EN Final report Amendments to the Plenary Source = BG, DE, EN, FR, IT, NL, PL, RO, SV Adopted text Generated NOT Translated 3-4 May 2012 6

XML The benefits foreseen for DG TRAD Automatic treatment of information Clear separation of content, metadata and presentation Formatting would be taken care of by the application Translators would only need to concentrate on the translation Rendering the same file in different ways could be done using style sheets On-screen display On-line publication Printing Editing Better management of multilingualism Including the source language information included in each element would make it easier to handle multilingual documents Extraction of elements for use in many different applications Indexing content For documentary research To search for terminology To identify elements for re-use or replacement Generating registry of documents, etc 3-4 May 2012 7

XML The disadvantages foreseen At that time, many CAT tools on the market could not cope with XML Adapting existing applications to work with an XML file rather than a Word document could be expensive and difficult Authoring or creating documents in XML would mean developing a dedicated XML editor for our purposes Resistance to change by users We ve always done it that way! 3-4 May 2012 8

XML Considerations How texts are broken down into logical elements from the point of view of the logic of the legislative process from the point of view of the specific requirements of automating translation to the maximum extent The relation between treatment of XML elements and of translation memory segments The IRO* logical document model which consisted of three layers (Business, Language and Representation) The idea that XML should be applied from the initial drafting of content to the final publication (Start to Finish) * IRO = College of Information Resource Officers 3-4 May 2012 9

XML - Implementation XML standard - Akoma Ntoso Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies Also means linked hearts in the language of the Akan people of West Africa developed by Bologna University at the request of UN/DESA and extended by EP XML schema covering the IRO Logical document model One single schema for all legislative documents (reports and amendments) Work = Business version original changes Expression = Language version translation changes (major and/or minor changes) Manifestation = Representation version PDF, Word, XML etc. e-parliament programme XML from start to finish Authoring tools for amendments and legislative documents Translation tool (Cat4Trad) Content Management System Possibly, a workflow system 3-4 May 2012 10

The e-parliament Program 3-4 May 2012 11

Why Cat4Trad? e-parliament translation tool specified 19 mandatory requirements in November 2009 including: Handle XML WYSIWYG view Multilingual source text handling in a sequential manner Automatic mark-up of amended text Over-the-shoulder translation ( peeking ) Replacement rather than matching wherever possible (metadata, not strings) Provide (fuzzy) matches for remainder of text (from Euramis) Improve fuzzy matches using MT techniques Where no matches found send for MT 3-4 May 2012 12

CAT Tool Options considered Current CAT Tool TWB 7.5 then 8.3 did not handle XML TagEditor XML, but fulfilled hardly any of the other requirements Off-the-shelf CAT Tool Would not fulfil some specific requirements, in particular sequential handling of multilingual source texts Develop internally Specifically designed to fulfil all our requirements 28 3-4 April May 2012 2011 13

CAT4TRAD v other CAT Tools CAT4TRAD A WYSIWYG editing environment for XML documents Automatic provision of standard text, and replacement (ICEM) of original text and non-amended segments in the amendment column Sequential treatment of multilingual source documents Automatic export of all TMX pairs for upload to Euramis according to uploading rules Collaborative and over-the-shoulder translation for all languages Automatic mark-up of amended text TagEditor / Off-the-Shelf Tag Editor very basic tag-based editor Off-the-Shelf translation in columns, separate WYSIWYG view sometimes possible Nearest fuzzy match from previous content in Euramis, can probably be restricted to Base reference document Treatment of one-language pair at a time Monolingual documents auto-production of TMX files Post-alignment for all language units for uploading to Euramis according to uploading rules for multilingual documents Shared translation memories with attendant problems - bilingual only, limited concurrent access (0 or 3-5 users) unless server-based option acquired Manual mark-up of amended text Sequential treatment of multilingual documents and over the shoulder translation are the consequences of one single architectural feature: every time a paragraph is selected for assisted translation, the relevant, up-to-the-minute situation is retrieved from the database 3-4 May 2012 14

Cat4Trad Special Requirements (1) Support an almost WYSIWYG display whenever it may help identify the parts of text that need special attention, for instance amendments in two-column layout What You See Is What You Get Translators can navigate through the document in WYSIWYG view and see their translations in place as soon as they are confirmed Benefits Ease of use for translators 3-4 May 2012 15

Example Word WYSIWYG 3-4 May 2012 16

Example Tag Editor 3-4 May 2012 17

Example off the shelf CAT Tool Source text This is the first segment of the text to be amended and it forms the first sentence of the first paragraph. This is the second segment of the text to be amended, it forms the second sentence of the first paragraph up to a semi-colon; This is the remainder of the second sentence of the first paragraph of the text to be amended. This is the third sentence of the text to be amended, you can imagine it as very long covering five or six lines in this editor, so it is repeated here this is the third sentence of the text to be amended, you can imagine it as very long covering five or six lines in this editor and so on until the first paragraph of the text to be amended is completed. NOTE: Depending on the editor, the second and subsequent paragraphs might be dealt with in a similar way to the first before moving on to the amending text (the right-hand column text in the amendment This is the first segment of the amending text and it forms the first sentence of the first paragraph. This is an additional bit of text added between the first and second sentences. This is the second segment of the amending text, it forms the second sentence of the first paragraph up to a semi-colon; Target text 3-4 May 2012 18

Example off the shelf CAT Tool Source text This is the remainder of the second sentence of the first paragraph of the amending text. This is the third sentence of the amending text, you can imagine it as very long covering five or six lines in this editor, so it is repeated here this is the third sentence of the amending text, you can imagine it as very long covering five or six lines in this editor and so on until the first paragraph of the text to be amended is completed. Target text 3-4 May 2012 19

Example Cat4Trad WYSIWYG 3-4 May 2012 20

Cat4Trad Special Requirements (2) Treat multilingual documents sequentially without having to refresh the Translation Memory at each change of the source language No more complicated splitting of files translating using different memories re-merging files and reading through for consistency post-aligning of multilingual documents from the pivot language to individual target languages (Cat4Trad will automatically export Pivot-All TMX files) Benefits significant time savings in the pre- and post-treatment of translations Time savings in translation The document can be translated sequentially 3-4 May 2012 21

Cat4Trad Special Requirements (3) Automatic mark-up (bold and italics) of legislative amendments Translators no longer have to worry about formatting, they just need to concentrate on the content Mark-up is done using a module specified by the lawyer-linguists The mark-up is available as soon as the translation is confirmed Benefits Translators concentrate on translating Mark-up is in accordance with lawyer-linguist rules rather than personal preference 3-4 May 2012 22

Cat4Trad Automatic Mark-up 3-4 May 2012 23

Cat4Trad Special Requirements (4) Support over-the-shoulder translation ( peeking ), whereby there is no need to await the finalisation of the official relay language version, any other language version being made available even at a nonfinalised stage Translators can translate from the pivot (or other preferred) language as soon as an amendment is confirmed instead of waiting for the whole document to be translated and released Translators can be assigned documents containing languages they do not know as they can switch to another source language when it is available [Small amounts of unknown languages, not a whole document!] Benefits Deadlines easier to respect as translators do not have to wait for the official relay language version to be completed before commencing work Translators can easily consult other translations for assistance with knotty problems More efficient work allocation 3-4 May 2012 24

Cat4Trad Over-the-shoulder 3-4 May 2012 25

Translate DE FR, EN Requested 3-4 May 2012 26

Oops... EN has been updated 28 3-4 April May 2012 2011 27

Cat4Trad Special Requirements (5) Integrate in a single tool the re-usable content from different sources Translators have all the potential re-usable content in one place, immediately available, but still have the option to search terminology and document databases Benefits Ease of use - Translators have a one-stop shop Time savings No searching for links, opening windows, etc 3-4 May 2012 28

Integrated application environments 3-4 May 2012 29

Re-usable content (1) Sources of re-usable content: ITER titles Standard RdM text Euramis Interinstitutional multilingual database (huge TM) Statistical Machine Translation (MT) coming soon Fuzzy matches Euramis TMX files resulting from analysis Concordance FullCat - indexed TMX files containing previous translations FullDoc indexed EP documents Quest various interinstitutional sources Machine Translation (under development) For full text (automatically) For strings (on demand) 3-4 May 2012 30

Re-usable Content Fuzzy Matches Amended original text Concordances from Euramis via FullCAT 3-4 May 2012 31

Concordance - FullCat 3-4 May 2012 32

Concordance - FullDoc 3-4 May 2012 33

Cat4Trad Special Requirements (6) Replace existing source text translations by reference rather than matching ICEM = In-Context Exact Matching No more copying and pasting from the target language version of the original document or relying on fuzzy matching to provide the segments from the correct version of the document Benefits Significant time savings in pre-treatment of the document Increased quality of translation the correct version of the original text is provided by replacement rather than fuzzy matching Time savings for translator as no checking of context of 100% match required Savings in costs for outsourced translations (100% matches paid either at 0% (ICEM) or 20% when translation in context needs to be verified) Example Committee agendas simple document, in production since 2 April 2012 Average translation time reduced by 75% i.e. from 1 hour to 15 minutes 20 language versions = average saving of 15 hours per agenda 40 agendas per month = average saving of 600 hours of translation per month 3-4 May 2012 34

Re-usable content (2) Sources for replacement Original text COM document Council Common Position PR INI ITER titles Standard text Normative memories and DocEP (current Word macro system) OJ pre-translated text from the RdM and ITER Future: RdM standard text from the e-parliament service (DM-XML) Sources for matching (see slide 30) Euramis - Interinstitutional multilingual database containing millions of segments from European legislation and institutional documents Machine translation (SMT) coming soon 3-4 May 2012 35

Cat4Trad Replacement XML 28 3-4 April May 2012 2011 36

Cat4Trad Replacement Target 3-4 May 2012 37

Example Cat4Trad Replacement Heading pre-translated (standard text replaced) Left-hand column replaced not matched Right-hand column translated with the CAT interface 3-4 May 2012 38

Cat4Trad - OJ Translation screen Standard text (RdM) Fixed text (generated automatically) ITER title Free text 3-4 May 2012 39

Important considerations Structure and rules are vital for the authoring tools A well-designed authoring tool makes all the difference for the downstream actors Know the rules before defining the authoring tool Example: The tool used by the Committee secretariats to produce agendas provides almost everything necessary from a pick-list including the dossiers being handled, the relevant agenda items, dates and times of meetings, etc. Free text is possible at any time but the authoring tool helps to reduce this. Example: There are very clear rules laid down in our Recueil de modèles on how documents should be formatted and how, for instance, amendments can be tabled depending on the legislative procedure, the legal basis, etc. The AT4AM tool follows these rules so presentation, verification 3-4 May 2012 40

Aims of the presentation End users case To explore the perception of direct and indirect benefits provided by such implementations To prove that introducing XML-based production chains into the lawmaking process can bring important benefits that justify the initial investment and positively impact on the quality of the process. 3-4 May 2012 41

XML The benefits for DG TRAD Automatic treatment of information Done Clear separation of content, metadata and presentation Formatting is taken care of by the application Translators only need to concentrate on the translation Rendering the same file in different ways using style sheets Better management of multilingualism SL included in each element easier to handle the documents In-house developed tool fulfils needs better than any off-the-shelf tool WYSIWYG view Multilingual source text handling Automatic mark-up of amended text Over-the-shoulder translation ( peeking ) Replacement rather than matching wherever possible (metadata, not strings) Provide (fuzzy) matches for remainder of text (from Euramis) Improve fuzzy matches using MT techniques Where no matches found send for MT 3-4 May 2012 42

More benefits resulting from Akoma Ntoso Logical components Logical components and their current versions are identified independently of their container. This independent versioning supports optimal content re-use both within new versions of the same document or within other documents. In particular, we are able to replace by reference the left-hand column of the amendments. Virtualisation Virtualisation of the document allows us to download to the client selected parts of the document identified by metadata by language by amendment number In future possibly by author and by amended article (if requested) This will be particularly useful when the Pure-XML repository is fully integrated with the workflow tools of DG TRAD (Gepro+ and T-Flow) 3-4 May 2012 43

XML The disadvantages overcome At that time, most CAT tools on the market could not cope with XML Decision = develop internally Adapting existing applications to work with an XML file rather than a Word document may be expensive and difficult Decision = do not adapt, develop from scratch Authoring or creating documents in XML meant developing a dedicated XML editor for our purposes Decision = Set up the e-parliament programme in order to develop those editors Resistance to change Cat4Trad has been introduced for simple documents and users love it! 3-4 May 2012 44

Questions? 3-4 May 2012 45