Control over Digital Technology Free and Open Source CAT Tools



Similar documents
XTM Cloud Explained. XTM Cloud Explained. Better Translation Technology. Page 1

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8

Prof. Dr. Klemens Waldhör Chief Architect

ChangeTracker Quick Start Guide

XTM for Language Service Providers Explained

Online Help for Project Managers and Translators

Transit NXT Product Guide Service Pack 7 09/2013

SYSTRAN v6 Quick Start Guide

SDL Trados Studio 2015 Project Management Quick Start Guide

Glossary of translation tool types

Question template for interviews

XLIFF SUPPORT IN CAT TOOLS

Introduction to OpenTM2 An Open Source Solution for Translators

Working with MateCat User manual and installation guide

Olifant: translation memory editor

Xerox Easy Translator Service User Guide

Introduction to the Translation Workspace

SDL Trados Studio 2015 Translation Memory Management Quick Start Guide

RemoteTM Web Server User Guide. Copyright Maxprograms

Software, Shareware and Opensource CSCU9B2


Your single-source partner for corporate product communication. Transit NXT Service Pack 8 What's new?

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Machine Translation as a translator's tool. Oleg Vigodsky Argonaut Ltd. (Translation Agency)

Administrator Manual Across Translator Edition v6.3 (Revision: 10. December 2015)

Localizing dynamic websites created from open source content management systems

SECURITY DOCUMENT. BetterTranslationTechnology

Administrator Manual Across Personal Edition v6 (Revision: February 4, 2015)

Project Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, Pangeanic - BI-Europe

RemoteTM LAN Server User Guide

SYSTRAN 6 Desktop User Guide. Chapter 1: Overview... 1 SYSTRAN Desktop Products Overview... 2

The "Eclipse Classic" version is recommended. Otherwise, a Java or RCP version of Eclipse is recommended.

OWrite One of the more interesting features Manipulating documents Documents can be printed OWrite has the look and feel Find and replace

KantanMT.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.

ERIKA Enterprise pre-built Virtual Machine

Advanced Event Viewer Manual

Better Translation Technology. User Manual For Administrators, Project Managers, Linguists & Customers

The MOLTO Translation Tools API

Living, Working, Breathing the toolset How Alpha CRC has incorporated memoq in its production process

Multilingual Translation Services

GOOGLE DOCS. 1. Creating an account

Evolution of cloud-based translation memory

USING MAGENTO TRANSLATION TOOLS

Translation and Localization Services

Getting Off to a Good Start: Best Practices for Terminology

1. Contents What is AGITO Translate? Supported formats Translation memory & termbase Access, login and support...

Automated Translation Quality Assurance and Quality Control. Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina

Stop searching, find your files and s fast.

OSCOTT Minimizing costs and enhancing minority language use: Open Source Collaborative Translation Technology OSCOTT

Metadata in Translation Tools: Importance, Usage, Storage, Transfer. Angelika Zerfaß & Richard Sikes

Leveraging Open Source / Freeware Solutions

Bulk Downloader. Call Recording: Bulk Downloader

Oracle Application Express MS Access on Steroids

SYSTRAN Enterprise Server 7 Online Tools User Guide. Chapter 1: Overview... 1 SYSTRAN Enterprise Server 7 Overview... 2

Translators Handbook

IBM's practice for facilitating interoperability of Operating Systems

Translation Services Company Profile

Installing an open source version of MateCat

Administration Guide. WatchDox Server. Version 4.8.0

CatDV Pro Workgroup Serve r

`````````````````SIRE USER GUIDE

Open Source Software Usage in the Schools conceptual strategy

NaviCell Data Visualization Python API

INUVIKA OPEN VIRTUAL DESKTOP FOUNDATION SERVER

`````````````````SIRE QUICK START GUIDE

What works for us and IT lessons learned

Storage Sync for Hyper-V. Installation Guide for Microsoft Hyper-V

Minimal Translation Management (M11M) a training for those working with customers who are managing translations as a side job -Introduction-

Translator s Workbench User Guide

EMCO Network Inventory 5.x

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Interoperabilnost LINUX-Windows. It is easily possible for Linux & Windows to coexist & even work together.

How to translate VisualPlace

System Requirements Across v6.3 (Revision: 10. December 2015)

Turnitin Blackboard 9.0 Integration Instructor User Manual

Content Management & Translation Management

from Microsoft Office

Converting Excel to SRT Format and Merging the Time Back In Using Subtitle Workshop By Binky of Wu Xia Fan Subbers

Sharp Remote Device Manager (SRDM) Server Software Setup Guide

Getting Started Guide. Chapter 10 Printing, Exporting, and ing

Oracle Universal Content Management

Desktop Solutions Quick Reference Card StarOffice 7 and StarSuite 7

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

ithenticate User Manual

Localization Profile 2014

Version 3.0 May P Xerox Mobile Print Cloud User How To and Troubleshooting Guide

Elements Professional Ultimate. New Features for imindmap V4.1

TRANSLATIONS FOR A WORKING WORLD. 2. Translate files in their source format. 1. Localize thoroughly

Chapter 3 Software. Computer Concepts Chapter Contents. 3 Section A: Software Basics

WatchDox Administrator's Guide. Application Version 3.7.5

Transcription:

Workshop Peter Sandrini University of Innsbruck, Austria Control over Digital Technology Free and Open Source CAT Tools Timisoara March 26, 2015 1/89

Abstract In a digitalized and globalized world, translation technology is becoming an inevitable part of translation. It not only concerns translators but also users of translation, trainers of translators, and localizers. Translation Technology can boost the efficiency and consistency of translation, but inconsiderate use of software and services may also cause translators losing control over the translation process and translation data. The workshop outlines the concept of translation technology as well as free and open source software and presents two compilations of available free translation technology tools developed at the University of Innsbruck: USBTrans a collection of preinstalled packages on a USB stick, and tuxtrans a tailor-made Linux distribution for translators. 2/89

Contents 1) Control over digital technology Free Software Free translation technology packages: USBTrans and tuxtrans 2) Typical work tasks: translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 3/89

Dominant Technology? it is hard to think of a business process that is not wholly, or partly, dependent on technology what about language services and translation? technological turn in translation (Cronin 2010) costs and risks of technology translation technology 4/89

Translation Technology any kind of digital Information and communication technology (ICT) which supports or performs the translation process with the aim of meeting adequate efficiency and quality requirements 5/89

Control choice vs (being) use(d) (independence) confidentiality integrity of program code integrity of data availability of data overall management of the IT configuration, installation of software 6/89

Control: examples Choose Chooseyour yoursoftware softwareindependently independently do not let translation do not let translationagencies agenciesdictate dictateyour yourchoice choice Do Donot notlet letfinancial financialfactors factorslimit limityour yourchoice choice Do Donot notlet letlicenses licenseslimit limityour yourfreedom freedom I Iwant wantto toinstall installmy mysoftware softwareon onaadesktop desktopand and on onaanotebook notebookcomputer computeras aswell wellas ason onmy mynetwork network I Iwant wanthassle-free hassle-freeupdates updates Be Besocial socialand andshare shareyour yoursoftware softwarewith withfriends friends the theeasiest easiestway wayto toguarantee guaranteecooperation cooperation and anddata dataexchange exchangewith withcolleagues colleagues Be Beconfident confidentabout aboutyour yourdata data I Ihave a confidentiality have a confidentialityagreement agreementwith withmy myclients clients my translation data are economic assets my translation data are economic assets 7/89

Win back and maintain control 8/89

Free and Open-Source Freedom to run the program as you wish, for any purpose (freedom 0). study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this. redistribute copies so you can help your neighbor (freedom 2). distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. 9/89 Licenses: GNU GPL Apache License 2.0 BSD 2/3 (L)PGL MIT license Mozilla Public License 2.0 Eclipse Public License Creative Commons

10/89

Why use free software? allow a cost-saving start of your career facilitate cooperation with colleagues avoid copyright infringements full control over your own PC ease of use without a license or activation code participation in developing applications through online communities changes (dependent) consumers into (autonomous) agents 11/89

Structural differences proprietary software free software Translation Environment Tools (TenT) all-in-one application for translators individual projects with specific functionality 12/89 Translation-Memory Terminology-Management Alignment Search for collocations Analysis and statistics Project management Code-Protection Batch-scripts Spellchecking Code page conversion Format conversion... translation memory analysis and statistics code protection terminology management project management code page conversion format conversion Spell checking...

TEnTs Translation Environment Tools (TenT) all-encompassing application for translators Generic term for TranslationMemory-System or Computer aided translation CAT-System 13/89 Translation-Memory Terminology-Management Alignment Search for collocations Analysis and statistics Project management Code-Protection Batch-scripts Spellchecking Code page conversion Format conversion...

market reality open Pootle OpenTMS FOLT Heartsome Gaupol BiText2TMXTinyTm Lokalize Okapi KBabel Sun OLT Virtaal proprietary OmegaT AnaphraseusJubler ForeignDesk Transolution PO-Edit ApertiumTranslate Toolkit SDL/Trados DéjàVu Rainbow MemoQ Star Transit Wordfast 14/89 Across Catalyst

FOSTT compiled 1. USBTrans for Windows http://homepage.uibk.ac.at/~c61302/fsftrans.html 2. tuxtrans Open Translation Desktop System http://www.tuxtrans.org 15/89

USBTrans a compilation of translation related free and open source software which can be started from an USB stick without installation (Portable Apps) download from http://homepage.uibk.ac.at/~c61302/fsftrans.htmlsome samples on your USB unpack the zip archive on your local pc and you are ready to go; you may also copy the unpacked files onto an USB stick (4GB) and start the programs from there just plug in the USB stick and start the USBTrans menu 16/89

tuxtrans complete desktop system for translators based on Linux as a free operating system and many specific application for translators multilingual Italian, English, Spanish, German by default with many more languages available online all open source or free software website: http://www.tuxtrans.org Twitter: https://twitter.com/tuxtrans live-system, it can be started from your USB stick without installation 17/89

Requirements operating system standard applications specific applications 18/89 } = free open-source

Requirements operating system standard applications specific applications } = free open-source why? 19/89 to be able to perform all tasks related to translation to be able to use and install it ad libitum to be able to distribute it to translators and students

tuxtrans may be used as 1) Live-DVD (without installation, slow performance) 2) Live-USB (without installation, rather slow performance) 3) Virtual machine (VirtualBox VMWare) 4) second OS (with installation, fast performance, easy to customize and adapt) 5) main operating system 20/89

prepare to migrate use cross-platform-applications and standard formats odt, pdf, tmx, tbx, xliff... 21/89

standard apps MS-Windows word processors LO-Writer LO-Writer spreadsheets LO-Calc LO-Calc presentations LO-Impress LO-Impress databases Access MYSQL DTP Xpress Scribus 22/89 Linux image manipulation Gimp Libre Office office suite Gimp Libre Office Browser Firefox Firefox E-Mail Thunderbird Thunderbird

translate with FOSS common tasks of a free-lance translator with free and open source translation technology on a free digital infrastructure with tuxtrans the Linux Desktop for translators 23/89

translate a Word document with TM support what you need: a word processor a TM system translation memories terminology lists 24/89

sample text 25/89

OmegaT: supported formats text formats Plain text (any encoding supported by Java), including Unicode StarOffice, OpenOffice.org, LibreOffice and OpenDocument Open XML (Microsoft 2007/2010/2013) (X)HTML (including complete website tree structure) Help & Manual HTML Help Compiler LaTeX DokuWiki CopyFlow Gold for QuarkXPress DocBook Typo3 LocManager Iceni Infix (PDF) XLIFF source = target TXML Wordfast source = target 26/89 localization formats Android resources Java.properties Key-value files Mozilla DTD Windows resources (RC) WiX localisation ResX Flash XML export Camtasia for Windows Magento CE localisation PO (Portable Object File) (reading existing translations) SubRip subtitles (SRT) SVG images

start OmegaT 27/89

OmegaT: organisation input terminology/glossaries conference.txt, congress.tbx... project specific configuration files filters.conf, segmentation.conf source text 9th PCTS Conference - Reg. Form EN-1.docx translation memories regform.tmx... 28/89

OmegaT: statistics 29/89

OmegaT: editor 30/89

OmegaT: organisation output new translation memories test01.tmx... new terminology glossary.txt spell checking files learned_words.txt, ignored_words.txt project specific configuration files filters.conf, segmentation.conf target text 9th PCTS Conference - Reg. Form EN-1.docx 31/89

Part 1 32/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assessment use text corpora 33/89

translate websites what you need: website with a few HTML documents local copy of website translation memory system existing translation memories existing terminology lists HTML editor 34/89

save website locally Httrack website copier: 35/89

OmegaT: tuxtrans.org 36/89

OmegaT: tuxtrans.org 37/89 source text = complete website directory structure with files target text = identical website directory structure with files

Edit HTML: Geany 38/89

Edit HTML: Bluefish 39/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assessment use text corpora 40/89

Align source and target text How can I create a translation memory from an existing translation? What you need: source text and target text an alignment tool time and patience 41/89

Alignment: BiText2TMX 42/89 no development only text format TM as TMX

Alignment: LF-Aligner Formats: txt (UTF-8!), rtf, doc, docx, odt, pdf, html 43/89 <tu creationdate="20140916t141042z" creationid="lf Aligner 3.11"><prop

TMX-Management Heartsome TMX-Editor 44/89 Edit, filter convert from TMX to..., to TMX from... QA...

TMX-Management Virtaal edit TMX QA 45/89

TMX-Management Okapi Olifant edit QA alpha 46/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assessment use text corpora 47/89

manage your terminology ForeignDesk Termbase 48/89 concept oriented import/export html, csv, Multiterm OmegaT integration with csv export + edit

search for terminology GoldenDict 49/89 several dictionary formats same formats as dictionaries in OmegaT

integrate dictionaries in OmegaT dictionaries 50/89 dictionaries vs glossaries: no interaction possible just reading of entries no editing nor adding all dictionaries in stardict format available on the web

dictionaries in OmegaT monolingual StarDict dictionary integrated in OmegaT: Oxford Advanced Learner dictionary 51/89

dictionaries in OmegaT bilingual StarDict dictionaries in OmegaT: en-de 52/89

glossaries in OmegaT create,edit and use glossary lists in OmegaT glossaries simple 3-column format source term (Tab) target term (Tab) note always UTF-8 coded file names: *.txt, *.tab, *.utf8, *.tbx terminology recognition and autocompleter in OmegaT: https://www.youtube.com/watch?v=lszi ap22qhq 53/89

glossaries in OmegaT create, edit and use glossary lists in OmegaT 54/89 add terminology: [CTRL Shift G] or Edit / Create glossay entry saved in Project / Properties:

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assessment use text corpora 55/89

term extraction simple monolingual term extraction with Rainbow 56/89

term extraction simple monolingual term extraction with Phrase extractor 57/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 58/89

Machine Translation what you need: an Open Source MT system a free on-line MT system an interface to your TM system 59/89

Apertium: OS MT system Apertium is a rule based Open Source MT system mostly regional language pairs from Spain with Englisch 60/89

Apertium: OS MT system Apertium offline can be integrated into OmegaT 61/89

on-line MT systems you need to register to use online Mt within OmegaT: Mymemory (only E-Mail) Microsoft Translator API ID + Secret key Google Translate API key OmegaT.I4J.ini 62/89

on-line MT systems available for free on the web Microsoft Translator Google Translate within OmegaT Apertium online + offline free Mymemory free Microsoft Translator free, but you need to register Google Translate registration + costs 63/89

on-line MT systems interface in OmegaT 64/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 65/89

convert file formats text formats from doc to docx, odt with LibreOffice 66/89

convert file formats CSV, TAB, TXT glossary lists to TBX with Heartsome Translation Studio CSV, TXT parallel texts to TMX Tms with Heartsome Okapi Rainbow to PO, TMX, CSV SDL/Trados TM and TB to TMX with TradosStudio resources converter 67/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 68/89

bilingual file types text files which include source text segments aligned to target text segments, i.e. files which contain translations for all segments or just for a part of segments XLIFF, sdlxliff, ttx can not be translated with OmegaT directly, or rather only when source text segment = target text segment existing translations can thus not be leveraged partly translated file as shown in Virtaal 69/89

bilingual file types Existing translations can be re-used in OmegaT by using Okapi Rainbow and by creating ready-to-go OmegaT translation projects 70/89 XLIFF sdlxliff ttx } extract source text create TM from existing translations translate project in OmegaT post process translation in Rainbow get translated bilingual files

Okapi Rainbow translation kit creation 71/89

Okapi Rainbow translation kit post processing 72/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 73/89

PDF files translate PDF files with OmegaT only text-based PDF and target text will be txt file extract text from PDF files with gpdftext ebook-editor (Linux) split and merge PDF files with PDF-Sam annotate PDF files with Xournal (Linux) compare PDF files with DiffPDF 74/89

PDF and OmegaT 75/89 translate text-based PDF directly in OmegaT

extract text from PDF gpdftext ebook-editor extracts Text from text-based PDF without line breaks 76/89

split and merge PDFs PDF-Sam 77/89

PDF-Dateien annotieren comment, underline, highlight, etc. with Xournal 78/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 79/89

Quality assurance 1) OmegaT QA scripts 2) Okapi Checkmate 3) TMX-Validator 4) XLIFF Checker 80/89

QA with OmegaT OmegaT QA scripts 81/89

QA with Checkmate 82/89

formal validation TMX-Validator XLIFF-Checker 83/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance use text corpora 84/89

create a reference corpus Why? search for terms, collocations, idioms... what you need: a representative quantity of LSP texts in the subject domain you want to translate a free concordance tool 85/89

TextSTAT file formats: doc, odt, html, txt concordance search frequency lists 86/89

AntConc AntConc less file formats more linguistic analysis 87/89

Overview: part II typical tasks of a translator translate a website create a TM on the basis of existing translations manage terminology and dictionaries extract terminology from texts use machine translation convert file formats manage bilingual files manage pdf files quality assurance +... use text corpora translation of subtitles editing and translating PO files creating mindmaps for terminology creating a web corpus localizing software evaluating MT output... 88/89

What next? Simply, try it out! Thank you for your attention! http://www.petersandrini.net http://uibk.academia.edu/petersandrini 89/89