Multilingual Term Extraction as a Service from Acrolinx. Ben Gottesman Michael Klemme Acrolinx CHAT2013
|
|
- Karen Pierce
- 8 years ago
- Views:
Transcription
1 Multilingual Term Extraction as a Service from Acrolinx Ben Gottesman Michael Klemme Acrolinx CHAT2013
2 Definitions term extraction: automatically identifying potential terms in a document (corpus) multilingual term extraction: automatically identifying potential terms and their translations in a document and its translation (parallel corpus / translation memory) The wizard begins creating the bootable image. Der Assistent beginnt mit der Erstellung des bootfähigen Image. ( or, if the source-language terminology already exists, just identify translations)
3 Synonyms Identify same-language synonyms via translations in common German Die Spannungsversorgung für die Elektronik wird vom Speisegerät G526 sichergestellt. Spannungsversorgung für interne Speisung (X3e) Unterspannung in der Stromversorgung English The voltage supply for the electronics is maintained by the power supply unit G526. Power supply for internal supply (X3e) Undervoltage in the power supply Spannungsversorgung Stromversorgung voltage supply power supply
4 Outline What is multilingual term extraction? What is the workflow from customer perspective? customer use case examples show extraction results, demonstrate human validation How does the extraction work? how we identify candidates source-language candidates translation candidates how we filter translation candidates how we identify source-language synonyms What is Acrolinx and how does MTE fit in?
5 Outline What is multilingual term extraction? What is the workflow from customer perspective? customer use case examples show extraction results, demonstrate human validation How does the extraction work? how we identify candidates source-language candidates translation candidates how we filter translation candidates how we identify source-language synonyms What is Acrolinx and how does MTE fit in?
6 Workflow: Customer perspective 1. Customer provides translated documents 2. Acrolinx provides extracted multilingual term candidates to customer 3. Customer validates candidates 4. Validated results become (or are added to) customer s term bank
7 Customer use cases, past examples Use case 1 de-<en,fr,es,it,pt> (mostly de-en) ~142,000 bilingual segments; ~2,685,000 tokens (total) Use case 2 de-<en,fr> (all data trilingual) ~132,000 bilingual segments; ~1,259,000 tokens data document-aligned, not segment-aligned, so extra step required Use case 3 en-de ~942,000 bilingual segments; ~25,000,000 tokens extract translations of a given list of keywords determine which keywords don t occur in data
8 Results human validation in Excel Baugruppe has been translated inconsistently into English in the past Mark respective translations as preferred/deprecated to guide translators in the future.
9 Results Stromversorgung and Einspeisung have translations in common. automatically identified as possible synonyms, so same Cluster ID To validate synonym link, edit Subcluster IDs to be the same. Mark respective variants as preferred/deprecated to guide authors.
10 Outline What is multilingual term extraction? What is the workflow from customer perspective? customer use case examples show extraction results, demonstrate human validation How does the extraction work? how we identify candidates source-language candidates translation candidates how we filter translation candidates how we identify source-language synonyms What is Acrolinx and how does MTE fit in?
11 How does the extraction work? Extract source-language term candidates from source-language text (unless source-language terminology exists) The wizard begins creating the bootable image. linguistics-based especially part-of-speech patterns same functionality built into the core Acrolinx product
12 How does the extraction work? Extract translation candidates of each sourcelanguage term candidate from target-language text The wizard begins creating the bootable image. Der Assistent beginnt mit der Erstellung des bootfähigen Image. use statistical phrase-alignment technology same used in statistical machine translation
13 How does the extraction work? Filter translation candidates translation candidates for Eingangsspannung (pink = filtered out) based on: confidence score calculated from translation probabilities can adjust threshold to favour precision or recall surface characteristics (closed-class words, punctuation) term-candidacy of translation (if possible for language)
14 How does the extraction work? Identify synonyms ( cluster candidates) cluster around Stromwandler (minimum link confidence threshold = 0.01) link confidence based on the degree to which translations are shared can adjust threshold to favour precision or recall of links
15 How does the extraction work? Identify synonyms ( cluster candidates) cluster around Stromwandler (minimum link confidence threshold = 0.03) link confidence based on the degree to which translations are shared can adjust threshold to favour precision or recall of links
16 Outline What is multilingual term extraction? What is the workflow from customer perspective? customer use case examples show extraction results, demonstrate human validation How does the extraction work? how we identify candidates source-language candidates translation candidates how we filter translation candidates how we identify source-language synonyms What is Acrolinx and how does MTE fit in?
17 What is Acrolinx? Acrolinx is Content Optimization Software. It helps authors make there text more correct, more consistent, and more readable.
18 What is Acrolinx? Acrolinx is Content Optimization Software. It helps authors make their text more correct, more consistent, and more readable. Consistent use of terminology is an important factor in the readability of text. Acrolinx provides: term extraction (monolingual, aka term harvesting) terminology management term checking Multilingual Term Extraction as a Service is a natural complement to the prior terminology functions.
19 tekom Visit Acrolinx at tekom! Hall 3, Stand 310
20 Outline What is multilingual term extraction? What is the workflow from customer perspective? customer use case examples show extraction results, demonstrate human validation How does the extraction work? how we identify candidates source-language candidates translation candidates how we filter translation candidates how we identify source-language synonyms What is Acrolinx and how does MTE fit in?
21 Questions?
Integration of Content Optimization Software into the Machine Translation Workflow. Ben Gottesman Acrolinx
Integration of Content Optimization Software into the Machine Translation Workflow Ben Gottesman Acrolinx What is Acrolinx? Acrolinx is Content Optimization Software. It helps authors make their text!
More informationVesna Lušicky & Tanja Wissik. Translating and the Computer Conference
Vesna Lušicky & Tanja Wissik University of Vienna Translating and the Computer Conference London, 29 & 30 November 2012 LISE project Workflow research Highlights needs analysis LISE Service Version Two
More informationCINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
More informationGetting Off to a Good Start: Best Practices for Terminology
Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction
More informationACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.
ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationPROMT-Adobe Case Study:
For Americas: 330 Townsend St., Suite 117, San Francisco, CA 94107 Tel: (415) 913-7586 Fax: (415) 913-7589 promtamericas@promt.com PROMT-Adobe Case Study: For other regions: 16A Dobrolubova av. ( Arena
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationTRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5
Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4
More informationTanja Wissik. COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm
Tanja Wissik COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm What is LISE? Main pillars of the LISE Project LISE Service Version and IATE use case Workflow Research
More informationComputer Aided Translation
Computer Aided Translation Philipp Koehn 30 April 2015 Why Machine Translation? 1 Assimilation reader initiates translation, wants to know content user is tolerant of inferior quality focus of majority
More informationKantanMT.com. www.kantanmt.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.
KantanMT.com No Hardware. No Software. No Hassle MT. The world s #1 MT Platform Communicate globally, easily! Create customized language solutions in the cloud. www.kantanmt.com What is KantanMT.com? KantanMT
More informationExtracting translation relations for humanreadable dictionaries from bilingual text
Extracting translation relations for humanreadable dictionaries from bilingual text Overview 1. Company 2. Translate pro 12.1 and AutoLearn 3. Translation workflow 4. Extraction method 5. Extended
More informationQuestion template for interviews
Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by
More informationCopyright 2005-2010 Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement
More information
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationAutomated Translation Quality Assurance and Quality Control. Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina
Automated Translation Quality Assurance and Quality Control Andrew Bredenkamp Daniel Grasmick Julia V. Makoushina Andrew Bredenkamp Introductions (all) CEO acrolinx, Computational Linguist, QA Tool Vendor
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationHIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN
HIERARCHICAL HYBRID TRANSLATION BETWEEN ENGLISH AND GERMAN Yu Chen, Andreas Eisele DFKI GmbH, Saarbrücken, Germany May 28, 2010 OUTLINE INTRODUCTION ARCHITECTURE EXPERIMENTS CONCLUSION SMT VS. RBMT [K.
More informationTHUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
More informationStatistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
More informationFormat 999-99-9999 OCR ICR. ID Protect From Vanguard Systems, Inc.
OCR/ICR 12 3-45. 76 79 12 3-45-7679 OCR Format 999-99-9999 Valid? Image Correction and Syntax search ICR 12 3-45. 76 79 12 3-45-76 79 Export Cancel ID Protect From Vanguard Systems, Inc. Integrated with
More informationMachine Translation. Agenda
Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm
More informationBITS: A Method for Bilingual Text Search over the Web
BITS: A Method for Bilingual Text Search over the Web Xiaoyi Ma, Mark Y. Liberman Linguistic Data Consortium 3615 Market St. Suite 200 Philadelphia, PA 19104, USA {xma,myl}@ldc.upenn.edu Abstract Parallel
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationHow One Word Can Make all the Difference
How One Word Can Make all the Difference Using Subject Metadata for Automatic Query Expansion and Reformulation Vivien Petras School of Information Management & Systems UC Berkeley Overview Introduction
More informationThe SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
More informationAutomation of Translation: Past, Presence, and Future Karl Heinz Freigang, Universität des Saarlandes, Saarbrücken
Automation of Translation: Past, Presence, and Future Karl Heinz Freigang, Universität des Saarlandes, Saarbrücken Introduction First attempts in "automating" the process of translation between natural
More informationStatistical Machine Translation
Statistical Machine Translation What works and what does not Andreas Maletti Universität Stuttgart maletti@ims.uni-stuttgart.de Stuttgart May 14, 2013 Statistical Machine Translation A. Maletti 1 Main
More informationBrauchen die Digital Humanities eine eigene Methodologie?
Deutsche DH, Passau 26.03.2014 Brauchen die Digital Humanities eine eigene Methodologie? 26. März 2014 Heyer / Niekler / Wiedemann 1 Übersicht Aspekte der Operationalisierung geistes- und sozialwissenschaftlicher
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationJoint Research Centre
Joint Research Centre Open Source Monitoring Tools and Applications emm.newsbrief.eu Serving society Stimulating innovation Supporting legislation Open Source Monitoring - Overview EMM Introduction Custom
More informationSurvey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
More informationRecent Developments in the Law & Technology Relating to Predictive Coding
Recent Developments in the Law & Technology Relating to Predictive Coding Presented by Paul Neale CEO Presented by Gene Klimov VP & Managing Director Presented by Gerard Britton Managing Director 2012
More informationCrowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis
, pp.138-142 http://dx.doi.org/10.14257/astl.2013.31.31 Crowdsourcing Fraud Detection Algorithm Based on Psychological Behavior Analysis Li Peng 1,2, Yu Xiao-yang 1, Liu Yang 2, Bi Ting-ting 2 1 Higher
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationUsing COTS Search Engines and Custom Query Strategies at CLEF
Using COTS Search Engines and Custom Query Strategies at CLEF David Nadeau, Mario Jarmasz, Caroline Barrière, George Foster, and Claude St-Jacques Language Technologies Research Centre Interactive Language
More informationTS3: an Improved Version of the Bilingual Concordancer TransSearch
TS3: an Improved Version of the Bilingual Concordancer TransSearch Stéphane HUET, Julien BOURDAILLET and Philippe LANGLAIS EAMT 2009 - Barcelona June 14, 2009 Computer assisted translation Preferred by
More informationReport on the embedding and evaluation of the second MT pilot
Report on the embedding and evaluation of the second MT pilot quality translation by deep language engineering approaches DELIVERABLE D3.10 VERSION 1.6 2015-11-02 P2 QTLeap Machine translation is a computational
More informationLINGUISTIC SUPPORT IN "THESIS WRITER": CORPUS-BASED ACADEMIC PHRASEOLOGY IN ENGLISH AND GERMAN
ELN INAUGURAL CONFERENCE, PRAGUE, 7-8 NOVEMBER 2015 EUROPEAN LITERACY NETWORK: RESEARCH AND APPLICATIONS Panel session Recent trends in Bachelor s dissertation/thesis research: foci, methods, approaches
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationText-Driven Ontology Generation and Extension in the Finance Domain. Mihaela Vela Language Technology Lab DFKI Saarbrücken
Text-Driven Ontology Generation and Extension in the Finance Domain Mihaela Vela Language Technology Lab DFKI Saarbrücken European MUSING project Development of Business Intelligence tools and modules
More informationGlossary of translation tool types
Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze
More informationPublish Acrolinx Terminology Changes via RSS
Publish Acrolinx Terminology Changes via RSS Only a limited number of people regularly access the Acrolinx Dashboard to monitor updates to terminology, but everybody uses an email program all the time.
More informationAutomatic Text Processing: Cross-Lingual. Text Categorization
Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo
More informationIntegra(on of human and machine transla(on. Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013
Integra(on of human and machine transla(on Marcello Federico Fondazione Bruno Kessler MT Marathon, Prague, Sept 2013 Motivation Human translation (HT) worldwide demand for translation services has accelerated,
More informationConfiguring and Administering Hyper-V in Windows Server 2012 MOC 55021
Configuring and Administering Hyper-V in Windows Server 2012 MOC 55021 In dem Kurs MOC 55021 Configuring and Administering Hyper-V in Windows Server 2012 lernen Sie die Konfiguration und Administration
More informationChildFreq: An Online Tool to Explore Word Frequencies in Child Language
LUCS Minor 16, 2010. ISSN 1104-1609. ChildFreq: An Online Tool to Explore Word Frequencies in Child Language Rasmus Bååth Lund University Cognitive Science Kungshuset, Lundagård, 222 22 Lund rasmus.baath@lucs.lu.se
More informationArcGIS for Server: Administrative Scripting and Automation
ArcGIS for Server: Administrative Scripting and Automation Shreyas Shinde Ranjit Iyer Esri UC 2014 Technical Workshop Agenda Introduction to server administration Command line tools ArcGIS Server Manager
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationHow To Write A Multilingual Web Conference
CMS and localisation Multilingual Web Conference Multilingual Web content management Limerick. September 21st, 2011 Page 1 ID Cocomore essentials Agency for integrated communication and IT services 100
More informationPreparing RTF and MS Word Files with Untranslatable Content for SDL Trados TagEditor & Déjà Vu
Preparing RTF and MS Word Files with Untranslatable Content for SDL Trados TagEditor & Déjà Vu Categories of jobs with text to skip...1 Multilingual tables...1 Documents using text properties to mark sections...4
More informationSYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
More informationYour single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8
Transit NXT Evolution from Service Pack 0 to Service Pack 8 April 2009: Transit NXT Service Pack 0 (Version 4.0.0.671) Additional versions of DTP programs supported: InDesign CS3 and FrameMaker 9 Additional
More informationImplementing Heuristic Miner for Different Types of Event Logs
Implementing Heuristic Miner for Different Types of Event Logs Angelina Prima Kurniati 1, GunturPrabawa Kusuma 2, GedeAgungAry Wisudiawan 3 1,3 School of Compuing, Telkom University, Indonesia. 2 School
More informationMining event log patterns in HPC systems
Mining event log patterns in HPC systems Ana Gainaru joint work with Franck Cappello and Bill Kramer HPC Resilience Summit 2010: Workshop on Resilience for Exascale HPC HPC Resilience Third Workshop Summit
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationSYSTRAN v6 Quick Start Guide
SYSTRAN v6 Quick Start Guide 2 SYSTRAN v6 Quick Start Guide SYSTRAN Business Translator SYSTRAN Premium Translator Congratulations on your SYSTRAN v6 purchase which uses the latest generation of language
More informationThe Value of Advanced Data Integration in a Big Data Services Company. Presenter: Flavio Villanustre, VP Technology September 2014
The Value of Advanced Data Integration in a Big Data Services Company Presenter: Flavio Villanustre, VP Technology September 2014 About LexisNexis We are among the largest providers of risk solutions in
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationExtraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
More informationCLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise
CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationMarkus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationTerminology Management in the Localization Industry. Results of the LISA Terminology Survey
Terminology Management in the Localization Industry Results of the LISA Terminology Survey Kara Warburton, LISA Terminology Advisor October 16, 2001 Terminology Management in the Localization Industry...
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationAutomated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)
Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger European Commission Joint Research Centre (JRC) https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems
More informationSVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
More informationCollaborative Innovation Driving Value to the Enterprise
Collaborative Innovation Driving Value to the Enterprise Simple Language and Machine Translation W3C day Berlin September 19, 2011 Overview The company Founded 1992 by Dr. Adriane Rinsche HQ in Kingston-upon-Thames,
More informationTransit NXT. Ergonomic design New functions Process-optimised user interface. STAR Group your single-source partner for information services & tools
Transit NXT Ergonomic design New functions Process-optimised user interface STAR Group your single-source partner for information services & tools Transit NXT focusing on ergonomics Ergonomic design The
More informationProject Management. From industrial perspective. A. Helle M. Herranz. EXPERT Summer School, 2014. Pangeanic - BI-Europe
Project Management From industrial perspective A. Helle M. Herranz Pangeanic - BI-Europe EXPERT Summer School, 2014 Outline 1 Introduction 2 3 Translation project management without MT Translation project
More informationSTAR Deutschland GmbH
STAR Deutschland GmbH Your partner for customised technical solutions in translation and information management STAR Group Your single-source partner for corporate product communication STAR Deutschland
More informationAnubis - speeding up Computer-Aided Translation
Anubis - speeding up Computer-Aided Translation Rafał Jaworski Adam Mickiewicz University Poznań, Poland rjawor@amu.edu.pl Abstract. In this paper, the idea of Computer-Aided Translation is first introduced
More informationFEISGILTT 2013 11 June 2013 London, GB. Heartsome Europe GmbH
Recommender Systems as part of Localization Project Management with XLIFF Prof. Dr. Klemens Waldhör & FOM University of Applied Science FEISGILTT 2013 11 June 2013 London, GB 1 Presentation overview Why
More informationOverview of iclef 2008: search log analysis for Multilingual Image Retrieval
Overview of iclef 2008: search log analysis for Multilingual Image Retrieval Julio Gonzalo Paul Clough Jussi Karlgren UNED U. Sheffield SICS Spain United Kingdom Sweden julio@lsi.uned.es p.d.clough@sheffield.ac.uk
More informationExploiting Keyword Structure for Domain-Specific Retrieval
Exploiting Keyword Structure for Domain-Specific Retrieval Christof Monz Jaap Kamps Maarten de Rijke Language & Inference Technology Group Institute for Logic, Language, and Computation University of Amsterdam
More informationThe Influence of Topic and Domain Specific Words on WER
The Influence of Topic and Domain Specific Words on WER And Can We Get the User in to Correct Them? Sebastian Stüker KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION
General and Professional Education 3/2013 pp. 21-27 ISSN 2084-1469 ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION Svetlana Sheremetyeva Department
More informationAppendix efile (EDI) Upload - Quick Start Guide. Tennessee Motor Fuels Electronic Filing System Motor Fuels efile (EDI) - Quick Start Guide
1 Tennessee Motor Fuels Electronic Filing System Motor Fuels efile (EDI) - Quick Start Guide 2 General Information Tennessee requires tax payers that file via EDI (efile) to upload files into the Tennessee
More informationChallenges of Automation in Translation Quality Management
Challenges of Automation in Translation Quality Management Berlin, 12.09.2009 Dr. François Massion D.O.G. Dokumentation ohne Grenzen GmbH francois.massion@dog-gmbh.de Overview Quality definition and quality
More informationContent Management & Translation Management
Content Management & Translation Management Michael Hoch Business Consulting SDL TRADOS Technologies @ 1. European RedDot User Conference London/Stansted AGENDA SDL TRADOS Technologies Some Terminology:
More informationNAPCS Product List for NAICS 54193: Translation and Interpretation Services
54193 1 Translation and interpretation services Converting written text, speech, or other live communication from one language to another. conversion to or from sign language or Braille. terminology and
More informationMaking reviews more consistent and efficient.
Making reviews more consistent and efficient. PREDICTIVE CODING AND ADVANCED ANALYTICS Predictive coding although yet to take hold with the enthusiasm initially anticipated is still considered by many
More informationResearch Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
More informationThe Principle of Translation Management Systems
The Principle of Translation Management Systems Computer-aided translations with the help of translation memory technology deliver numerous advantages. Nevertheless, many enterprises have not yet or only
More informationTranslation and Localization Services
Translation and Localization Services Company Overview InterSol, Inc., a California corporation founded in 1996, provides clients with international language solutions. InterSol delivers multilingual solutions
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationWhy Evaluation? Machine Translation. Evaluation. Evaluation Metrics. Ten Translations of a Chinese Sentence. How good is a given system?
Why Evaluation? How good is a given system? Machine Translation Evaluation Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better?
More informationDiscovery of Electronically Stored Information ECBA conference Tallinn October 2012
Discovery of Electronically Stored Information ECBA conference Tallinn October 2012 Jan Balatka, Deloitte Czech Republic, Analytic & Forensic Technology unit Agenda Introduction ediscovery investigation
More informationMachine Translation. Why Evaluation? Evaluation. Ten Translations of a Chinese Sentence. Evaluation Metrics. But MT evaluation is a di cult problem!
Why Evaluation? How good is a given system? Which one is the best system for our purpose? How much did we improve our system? How can we tune our system to become better? But MT evaluation is a di cult
More informationIBM SPSS Modeler Text Analytics 16 User's Guide
IBM SPSS Modeler Text Analytics 16 User's Guide Note Before using this information and the product it supports, read the information in Notices on page 225. Product Information This edition applies to
More informationdedupe Documentation Release 1.0.0 Forest Gregg, Derek Eder, and contributors
dedupe Documentation Release 1.0.0 Forest Gregg, Derek Eder, and contributors January 04, 2016 Contents 1 Important links 3 2 Contents 5 2.1 API Documentation...........................................
More informationStefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
More informationReviewed by Richard Sikes Fully featured language management system makes strong market entry
Across Language Server v5 Reviewed by Richard Sikes Fully featured language management system makes strong market entry There is no doubt that Across has put a lot of brainpower into architecting its flagship
More informationMulti-language E-Discovery
Multi-language E-Discovery Perspectives from Attorney, Practice Support Professional And Linguist Bart Holladay, Linguist Catalyst Language Services W. Peter Cladouhos, Esq. e-discovery Consultant Paul
More information