Towards automatic terminology extraction for Norwegian based on parallel corpora

Size: px
Start display at page:

Download "Towards automatic terminology extraction for Norwegian based on parallel corpora"

Transcription

1 Towards automatic terminology extraction for Norwegian based on parallel corpora Gisle Andersen LSP Conference, Vienna 8 July 2015

2 Background and contents NHH is developing a national infrastructure that integrates terminological language resources Termportalen; WP7 in CLARINO project (NFR) Many specialist fields lacking systematic terminology Case: Sjøfartsdirektoratet (Norwegian Maritime Authority) Contents Introduction: aim Data and methods Pattern matching Conclusion 2

3 Aim of work Purpose: providing aid to field experts where systematic terminology work is lacking A generic system meant to enhance terminology for various domains Maximising value of existing tools and language resources Setting up an infrastructure, a production line for term extraction (TE) Using a variety of techniques for parallel corpusbased TE 3

4 The necessary disclaimer What is extracted through computational methods are always term candidates. Need for subsequent manual check by field experts Need to supply additional information about concepts (definitions, structure, term variation); cf. e.g. Heylen & De Hertog (2015) 4

5 Towards automatic terminology extraction for Norwegian DATA AND METHODS 5

6 The corpus (1/2) Sjøfartsdirektoratet (NMA) Parallel corpus of translated texts (EN NO) Policies and legislation relating to shipping - navigation, communication, safety, etc. Current version: translated regulations from International Maritime Organization (IMO) - small; currently 9 items To be extended to include - Skipssikkerhetsloven / The Ship Safety and Security Act - NMA s own regulations - Etc. 6

7 The corpus (2/2) Title of regulation Navn på forskrift TCA2 fil TCA2 fil Regulations of 1 July 2014 No on the Forskrift 01. juli 2014 om bygging av RCS_E RCS_N construction of ships skip Regulations of 1 July 2014 No. 944 Forskrift 1. juli 2014 om farlig last på RDG_E RDG_N on dangerous goods on Norwegian ships norske skip Regulations of 1 July 2014 No on fire Forskrift 1. juli 2014 om brannsikring RFP_E RFP_N protection on ships på skip Regulations of 1 July 2014 on life-saving Forskrift 1. juli 2014 om RLS_E RLS_N appliances on ships redningsredskaper på skip Regulations of 5 June 2014 No. 805 on medical examination of employees on Norwegian ships and mobile offshore units Forskrift nr. xxxx om helseundersøkelse av arbeidstakere på norske skip og flyttbare RME_E RME_N Regulations of 5 September 2014 No on navigation and navigational aids for ships and mobile offshore units Regulations of 1 July 2014 No. 955 concerning radiocommunication equipment for Norwegian ships and mobile offshore units Regulations of 5 January 2014 No on a safety management system for Norwegian ships and mobile offshore units IMO standard marine communication phrases (SMCPs) innretninger Forskrift om navigasjon og navigasjonshjelpemidler for skip og flyttbare innretninger Forskrift 1. juli 2014 om radiokommunikasjonsutstyr for norske skip og flyttbare innretninger Forskrift om sikkerhetsstyringssystem for norske skip, og flyttbare innretninger IMOs standarduttrykk for maritim kommunikasjon RNN_E RRR_E RSM_E SMCP_E RNN_N RRE_N RSM_N SMCP_N 7

8 System architecture for TE 8

9 Step 1: Text conversion: doc html xml 9

10 Step 2: Alignment of parallel corpus texts 10

11 Step 3A: Pattern matching Premise: recognisable patterns in sentence and paragraph structure, punctuation, etc. suggesting termhood Extraction based on regular expressions (perl) <s>b) barges;</s> <s>the spooling device shall:</s> <s>a) initial certification upon changes in use;</s> <s>e) handrails, corridors and passageways, doorways, doors, lifts, vehicle decks, passenger lounges, accommodation and washrooms shall be Wire/chain stoppers shall be dimensioned for a safe working load <s>d) lektere</s> <s>spoleapparatet skal:</s> <s>a) førstegangssertifisering ved endret bruk</s> <s>e) Håndlister, korridorer og ganger, døråpninger, dører, heiser, bildekk, passasjersalonger, innredning og toaletter skal være En wire- og kjettingstopper skal være dimensjonert for en sikker arbeidsbelastning 11

12 Step 3B: Check of terminological inventory Premise: if word/sequence of words is already registered as term in other component of Termportalen, it has high termhood (it is likely to constitute a term in current context also) Question 1: same or different translation relation Question 2: same or different domain Methodological issue: inflected forms in texts; base form in term base 12

13 Step 3C: Neology detection Premise: if word/sequence of words can be shown to be a neologism (domain-specific vocabulary), it has high termhood (is likely to be a term) Check against inventory of words in large general language corpus (GLC); Norsk aviskorpus (Norwegian Newspaper Corpus, NNC; cf. Andersen 2012; Andersen & Hofland 2012) Check among neologisms registered in NNC s neology database 13

14 Step 3D: Monolingual/bilingual lexicon lookup Premise: if word/sequence of words is found among the lexical inventory in a mono/bilingual technical or specialised dictionary, it has high termhood Agreement with Kunnskapsforlaget to reuse some of their manuscripts 14

15 Step 3E: Association measures (AMs) Premise: terms are often constituted as collocations, i.e. words with a strong tendency to co-occur, so strong collocations may be seen as indicators of termhood Association measures, statistical measures of unithood/termhood (Heylen & De Hertog 2015) Important to select adequate AM for TE, e.g. Pointwise Mutual Information, Chi-square (cf. Lyse & Andersen 2012) Collocation patterns should be compared with GLC data (NNC) 15

16 Step 3F: Parsing techniques Premise: terminological units are typically constituted as (complex) noun phrases; output from syntactic parsing may give good guidance towards terminological units Parsers for Norwegian and English: INESS project (UiB; cf. Rosén 2012) 16

17 Towards automatic terminology extraction for Norwegian A CLOSER LOOK AT PATTERN MATCHING 17

18 Term extraction based on pattern matching final loading conditions Hydrostatics containing the following parameters as a function of the draught with a specified reference point Endelige lastetilstander Hydrostatikk som inneholder følgende parametere som funksjon av dypgang med spesifisert referansepunkt?? displacement deplasement KB KB centre of buoyancy oppdriftssenter If warranted by the ferry's size or type Når fergens størrelse eller type tilsier det the Norwegian Maritime Authority may require the mooring arrangement to be dimensioned for a mooring force higher than 30 tonnes kan Sjøfartsdirektoratet kreve at fortøyningsarrangementet blir dimensjonert for høyere fortøyningskraft enn 30 vekttonn KM transverse metacentre above the baseline KM tverrskips metasenter over basis AwT waterline area TP1 tonnes per unit submersion MT1 moment to change trim LCF longitudinal centre of flotation LCB longitudinal centre of buoyancy AwT vannlinjeareal TP1 enhets neddykking MT1 enhets trimmoment LCF langskips flotasjonssenter LCB langskips oppdriftssenter 18

19 Output of procedure: term database file (tsv) 19

20 The next stage: manual editing in Termportalen 20

21 Towards automatic terminology extraction for Norwegian CONCLUSION 21

22 Other remaining tasks The output of each processing procedure: a bilingual list of term candidates Precision and recall needs to be checked against a gold standard Will be developed via manual term extraction performed by field experts/research assistant The performance/contribution of each module will be checked separately Degree of overlap needs alto to be checked 22

23 Summary A hybrid approach, using a combination linguistic and statistical approaches to (bilingual) TE combining the strengths of both approaches at the same time attempting to utilise and maximise the value of old existing language resources although based on data drawn specifically from the maritime sector, the production line and infrastructure proposed here is meant to be generic and applicable in (all) other domains 23

24 References Andersen, Gisle, ed Exploring Newspaper Language - Using the web to create and investgate a large corpus of modern Norwegian. Amsterdam: John Benjamins. Andersen, Gisle, and Knut Hofland Building a large monitor corpus based on newspapers on the web. In Exploring Newspaper Language - Using the web to create and investigate a large corpus of modern Norwegian, edited by G. Andersen. Amsterdam: John Benjamins. Heylen, Kris, and Dirk De Hertog Automatic term extraction. In Handbook of Terminology, edited by H. J. Kockaert and F. Steurs. Amsterdam: John Benjamins. Lyse, Gunn Inger, and Gisle Andersen Collocations and statistical analysis of n-grams. In Exploring Newspaper Language - Using the web to create and investigate a large corpus of modern Norwegian, edited by G. Andersen. Amsterdam: John Benjamins. Rosén, Victoria Exploring corpora through syntactic annotation. In Exploring Newspaper Language - Using the web to create and investigate a large corpus of modern Norwegian, edited by G. Andersen: John Benjamins. 24

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Annotation Guidelines for Dutch-English Word Alignment

Annotation Guidelines for Dutch-English Word Alignment Annotation Guidelines for Dutch-English Word Alignment version 1.0 LT3 Technical Report LT3 10-01 Lieve Macken LT3 Language and Translation Technology Team Faculty of Translation Studies University College

More information

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告

SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:

More information

Language policies and language use in Norwegian higher education

Language policies and language use in Norwegian higher education Language policies and language use in Norwegian higher education National Languages g and Terminology in Higher Education, Science & Technology, 7 November 2013 Marita Kristiansen Norwegian School of Economics

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier

Maskinöversättning 2008. F2 Översättningssvårigheter + Översättningsstrategier Maskinöversättning 2008 F2 Översättningssvårigheter + Översättningsstrategier Flertydighet i källspråket poäng point, points, credit, credits, var verb ->was, were pron -> each adv -> where adj -> every

More information

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统

SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统 SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande

More information

Learning Translation Rules from Bilingual English Filipino Corpus

Learning Translation Rules from Bilingual English Filipino Corpus Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

More information

Hybrid Machine Translation Guided by a Rule Based System

Hybrid Machine Translation Guided by a Rule Based System Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

Trends in corpus specialisation

Trends in corpus specialisation ANA DÍAZ-NEGRILLO / FRANCISCO JAVIER DÍAZ-PÉREZ Trends in corpus specialisation 1. Introduction Computerised corpus linguistics set off around the 1960s with the compilation and exploitation of the first

More information

Finding financial terminology in Norwegian newspapers

Finding financial terminology in Norwegian newspapers Finding financial terminology in Norwegian newspapers The article presents a study of anglicisms evident in Norwegian newspapers that can be related to the current financial crisis of 2007-2010. Examples

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History

More information

The Oxford Learner s Dictionary of Academic English

The Oxford Learner s Dictionary of Academic English ISEJ Advertorial The Oxford Learner s Dictionary of Academic English Oxford University Press The Oxford Learner s Dictionary of Academic English (OLDAE) is a brand new learner s dictionary aimed at students

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

REVIEW OF STCW PASSENGER SHIP-SPECIFIC SAFETY TRAINING. Proposed Amendments to the STCW Convention passenger ship specific safety training

REVIEW OF STCW PASSENGER SHIP-SPECIFIC SAFETY TRAINING. Proposed Amendments to the STCW Convention passenger ship specific safety training E SUB-COMMITTEE ON HUMAN ELEMENT, TRAINING AND WATCHKEEPING 3rd session Agenda item 10 HTW 3/10 30 October 2015 Original: ENGLISH REVIEW OF STCW PASSENGER SHIP-SPECIFIC SAFETY TRAINING Proposed Amendments

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Automatic identification of construction candidates for a Swedish constructicon

Automatic identification of construction candidates for a Swedish constructicon Automatic identification of construction candidates for a Swedish constructicon Linnea Bäckström, Lars Borin, Markus Forsberg, Benjamin Lyngfelt, Julia Prentice, and Emma Sköldberg Språkbanken University

More information

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted

More information

Schema documentation for types1.2.xsd

Schema documentation for types1.2.xsd Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................

More information

Joint efforts to further develop and incorporate Apertium into the document management flow at Universitat Oberta de Catalunya

Joint efforts to further develop and incorporate Apertium into the document management flow at Universitat Oberta de Catalunya Joint efforts to further develop and incorporate Apertium into the document management flow at Universitat Oberta de Catalunya Luis Villarejo*, Sergio Ortiz** and Mireia Ginestí** *Learning Technologies

More information

Question template for interviews

Question template for interviews Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by

More information

Processing: current projects and research at the IXA Group

Processing: current projects and research at the IXA Group Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive

More information

Regulation of 15 September 1992 No. 704 concerning operating arrangements on Norwegian ships

Regulation of 15 September 1992 No. 704 concerning operating arrangements on Norwegian ships Regulation of 5 September 992 No. 704 concerning operating arrangements on Norwegian ships Laid down by the Norwegian Maritime Directorate on 5 September 992 pursuant to the Act of 9 June 903 no. 7 relating

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,

More information

A MATTER OF STABILITY AND TRIM By Samuel Halpern

A MATTER OF STABILITY AND TRIM By Samuel Halpern A MATTER OF STABILITY AND TRIM By Samuel Halpern INTRODUCTION This short paper deals with the location of Titanic s Center of Buoyancy (B), Center of Gravity (G) and Metacenter Height (M) on the night

More information

Brill s rule-based PoS tagger

Brill s rule-based PoS tagger Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

More information

Privacy Issues in Online Machine Translation Services European Perspective.

Privacy Issues in Online Machine Translation Services European Perspective. Privacy Issues in Online Machine Translation Services European Perspective. Pawel Kamocki, Jim O'Regan IDS Mannheim / Paris Descartes / WWU Münster Centre for Language and Communication Studies, Trinity

More information

Natural Language Dialogue in a Virtual Assistant Interface

Natural Language Dialogue in a Virtual Assistant Interface Natural Language Dialogue in a Virtual Assistant Interface Ana M. García-Serrano, Luis Rodrigo-Aguado, Javier Calle Intelligent Systems Research Group Facultad de Informática Universidad Politécnica de

More information

Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY

Corpus and Discourse. The Web As Corpus. Theory and Practice MARISTELLA GATTO LONDON NEW DELHI NEW YORK SYDNEY Corpus and Discourse The Web As Corpus Theory and Practice MARISTELLA GATTO B L O O M S B U R Y LONDON NEW DELHI NEW YORK SYDNEY Contents List of Figures xiii List of Tables xvii Preface xix Acknowledgements

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Order on maritime security training on board ships

Order on maritime security training on board ships Translation. Only the Danish document has legal validity. Order no. 1279 of 7 November 2013 issued by the Danish Maritime Authority Order on maritime security training on board ships In pursuance of section

More information

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR

DAM-LR at the INL Archive Formation and Local INL. Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl 01/03/2007 DAM-LR DAM-LR at the INL Archive Formation and Local INL Remco van Veenendaal veenendaal@inl.nl http://imdi.inl.nl Introducing Remco van Veenendaal Project manager DAM-LR Acting project manager Dutch HLT Agency

More information

GUIDELINES FOR FLOODING DETECTION SYSTEMS ON PASSENGER SHIPS

GUIDELINES FOR FLOODING DETECTION SYSTEMS ON PASSENGER SHIPS INTERNATIONAL MARITIME ORGANIZATION 4 ALBERT EMBANKMENT LONDON SE1 7SR Telephone: 020 7735 7611 Fax: 020 7587 3210 IMO E Ref. T1/2.04 MSC.1/Circ.1291 9 December 2008 GUIDELINES FOR FLOODING DETECTION SYSTEMS

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

How To Rank Term And Collocation In A Newspaper

How To Rank Term And Collocation In A Newspaper You Can t Beat Frequency (Unless You Use Linguistic Knowledge) A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter Udo Hahn Jena University Language & Information

More information

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme)

How To Identify And Represent Multiword Expressions (Mwe) In A Multiword Expression (Irme) The STEVIN IRME Project Jan Odijk STEVIN Midterm Workshop Rotterdam, June 27, 2008 IRME Identification and lexical Representation of Multiword Expressions (MWEs) Participants: Uil-OTS, Utrecht Nicole Grégoire,

More information

Hvis personallisten ikke er ført slik reglene sier, kan Skatteetaten ilegge overtredelsesgebyr.

Hvis personallisten ikke er ført slik reglene sier, kan Skatteetaten ilegge overtredelsesgebyr. Denne boken er utgitt av Skatteetaten og sendes gratis til alle som er pålagt å føre personalliste fra 1. januar 2014. Det vil si bransjene servering, frisør, skjønnhetspleie, bilpleie og bilverksted.

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Central and South-East European Resources in META-SHARE

Central and South-East European Resources in META-SHARE Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB

More information

BILINGUAL TRANSLATION SYSTEM

BILINGUAL TRANSLATION SYSTEM BILINGUAL TRANSLATION SYSTEM (FOR ENGLISH AND TAMIL) Dr. S. Saraswathi Associate Professor M. Anusiya P. Kanivadhana S. Sathiya Abstract--- The project aims in developing Bilingual Translation System for

More information

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no.

ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. ACCURAT Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation www.accurat-project.eu Project no. 248347 Deliverable D5.4 Report on requirements, implementation

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora

Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora Audrey Laroche OLST Dép. de linguistique et de traduction Université de Montréal audrey.laroche@umontreal.ca

More information

Domain-specific terminology extraction for Machine Translation. Mihael Arcan

Domain-specific terminology extraction for Machine Translation. Mihael Arcan Domain-specific terminology extraction for Machine Translation Mihael Arcan Outline Phd topic Introduction Resources Tools Multi Word Extraction (MWE) extraction Projection of MWE Evaluation Future Work

More information

About risk analyses / risk evaluation

About risk analyses / risk evaluation About risk analyses / risk evaluation Tools Examples Practical exercise Thale Henden (HMS-koordinator PA/Medfak) Anne-Kristin Bjørnbakk (Satkkevollan bedriftshelsetjeneste) Karin Lia (HMS-koordinator NFH)

More information

Translation and Localization Services

Translation and Localization Services Translation and Localization Services Company Overview InterSol, Inc., a California corporation founded in 1996, provides clients with international language solutions. InterSol delivers multilingual solutions

More information

Australian Maritime Safety Authority

Australian Maritime Safety Authority Australian Maritime Safety Authority About the Australian Maritime Safety Authority The Australian Maritime Safety Authority (AMSA) is a statutory authority established under the Australian Maritime Safety

More information

TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5

TRANSREAD LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS. Projet ANR 201 2 CORD 01 5 Projet ANR 201 2 CORD 01 5 TRANSREAD Lecture et interaction bilingues enrichies par les données d'alignement LIVRABLE 3.1 QUALITY CONTROL IN HUMAN TRANSLATIONS: USE CASES AND SPECIFICATIONS Avril 201 4

More information

AUTHOR(S) Are W. Brandt CLIENT(S) STF22 A98833 Unrestricted Arne Johansen, DBE and Joakim Nielsen, SRV

AUTHOR(S) Are W. Brandt CLIENT(S) STF22 A98833 Unrestricted Arne Johansen, DBE and Joakim Nielsen, SRV TITLE SINTEF REPORT SINTEF Civil and Environmental Engineering Norwegian Fire Research Laboratory Address: Location: N-7034 Trondheim, NORWAY Tiller bru, Tiller Telephone: +47 73 59 10 78 Fax: +47 73 59

More information

The University of Amsterdam s Question Answering System at QA@CLEF 2007

The University of Amsterdam s Question Answering System at QA@CLEF 2007 The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,

More information

Occupational Noise in the Norwegian oil industry:

Occupational Noise in the Norwegian oil industry: Occupational Noise in the Norwegian oil industry: Cost/benefit as a result of new requirements in Norwegian Oil and Gas Recommended Guidelines for Handling Noise Tønnes A. Ognedal, Sinus AS Reidulf Klovning,

More information

Information extraction from online XML-encoded documents

Information extraction from online XML-encoded documents Information extraction from online XML-encoded documents From: AAAI Technical Report WS-98-14. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Patricia Lutsky ArborText, Inc. 1000

More information

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

Recent developments in machine translation policy at the European Patent Office

Recent developments in machine translation policy at the European Patent Office Recent developments in machine translation policy at the European Patent Office Dr Georg Artelsmair Director European Co-operation European Patent Office Brussels, 17 November 2010 The European Patent

More information

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1 Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Extraction and Visualization of Protein-Protein Interactions from PubMed

Extraction and Visualization of Protein-Protein Interactions from PubMed Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

Teaching terms: a corpus-based approach to terminology in ESP classes

Teaching terms: a corpus-based approach to terminology in ESP classes Teaching terms: a corpus-based approach to terminology in ESP classes Maria João Cotter Lisbon School of Accountancy and Administration (ISCAL) (Portugal) Abstract This paper will build up on corpus linguistic

More information

TechWatch. Technology and Market Observation powered by SMILA

TechWatch. Technology and Market Observation powered by SMILA TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»

More information

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1 Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1 Maria Teresa Pazienza, Armando Stellato and Michele Vindigni Department of Computer Science, Systems and Management,

More information

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects

More information

DEPARTMENT OF MARINE SERVICES AND MERCHANT SHIPPING (ADOMS) Boatmaster s Licenses

DEPARTMENT OF MARINE SERVICES AND MERCHANT SHIPPING (ADOMS) Boatmaster s Licenses CIRCULAR Local 2013-001 DEPARTMENT OF MARINE SERVICES AND MERCHANT SHIPPING (ADOMS) Boatmaster s Licenses Ref SCV Code. Companies operating SCV certificated vessels under the flag of Antigua and Barbuda.

More information

An Online Service for SUbtitling by MAchine Translation

An Online Service for SUbtitling by MAchine Translation SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam

More information

Niels Hjørnet Yacht Design Yacht Design. Niels Hjørnet Yacht Design

Niels Hjørnet Yacht Design Yacht Design. Niels Hjørnet Yacht Design Niels Hjørnet Yacht Design Øko-Ø færge Røde tal på bundlinjen 011 Egholm færgen: Fursund Færgeri: Thyborøn-Agger færgen: Mors-Thy færgefart: Venø færgen: Hals-Egense færgen: Hvalpsund-Sundsøre færgen:

More information

REPORT ON THE WORKBENCH FOR DEVELOPERS

REPORT ON THE WORKBENCH FOR DEVELOPERS REPORT ON THE WORKBENCH FOR DEVELOPERS for developers DELIVERABLE D3.2 VERSION 1.3 2015 JUNE 15 QTLeap Machine translation is a computational procedure that seeks to provide the translation of utterances

More information

Getting Off to a Good Start: Best Practices for Terminology

Getting Off to a Good Start: Best Practices for Terminology Getting Off to a Good Start: Best Practices for Terminology Technologies for term bases, term extraction and term checks Angelika Zerfass, zerfass@zaac.de Tools in the Terminology Life Cycle Extraction

More information

Collaborative Machine Translation Service for Scientific texts

Collaborative Machine Translation Service for Scientific texts Collaborative Machine Translation Service for Scientific texts Patrik Lambert patrik.lambert@lium.univ-lemans.fr Jean Senellart Systran SA senellart@systran.fr Laurent Romary Humboldt Universität Berlin

More information

Language and Computation

Language and Computation Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Release: 1. CPP20307 Certificate II in Technical Security

Release: 1. CPP20307 Certificate II in Technical Security Release: 1 CPP20307 Certificate II in Technical Security CPP20307 Certificate II in Technical Security Modification History Description Pathways Information Licensing/Regulatory Information Entry Requirements

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information

GUIDE to completion of the Excel spreadsheet

GUIDE to completion of the Excel spreadsheet GUIDE to completion of the Excel spreadsheet 1 S i d e 1 TABLE OF CONTENTS 1 TABLE OF CONTENTS... 2 1.1 Data Entry Generalities... 3 1.2 Prepare data... 5 1.3 Simple Data Entry (horizontal direction)...

More information

THE knowledge needed by software developers

THE knowledge needed by software developers SUBMITTED TO IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 Extracting Development Tasks to Navigate Software Documentation Christoph Treude, Martin P. Robillard and Barthélémy Dagenais Abstract Knowledge

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION

ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION General and Professional Education 3/2013 pp. 21-27 ISSN 2084-1469 ON GETTING THE MOST OUT OF INTERNET RESOURCES TO RAISE TRANSLATION QUALITY OF PROFESSIONAL DOCUMENTATION Svetlana Sheremetyeva Department

More information

Glossary of translation tool types

Glossary of translation tool types Glossary of translation tool types Tool type Description French equivalent Active terminology recognition tools Bilingual concordancers Active terminology recognition (ATR) tools automatically analyze

More information

Car Passenger Ferry Portugal

Car Passenger Ferry Portugal Car Passenger Ferry Portugal Type Ref. ID Living Area Total Area Pris Car & Passenger Vessels HQA-709503 0 sq. m 0 sq. m Be om pris Lugarer Senger Etasjer Furnished Annonsert Dato 52 112 0 Nei July 7,

More information

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web

Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text

More information

Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática. Building Domain Specific Corpora in Portuguese Language

Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática. Building Domain Specific Corpora in Portuguese Language Pontifícia Universidade Católica do Rio Grande do Sul Faculdade de Informática Programa de Pós-Graduação em Ciência da Computação Building Domain Specific Corpora in Portuguese Language Lucelene Lopes,

More information

Using the BNC to create and develop educational materials and a website for learners of English

Using the BNC to create and develop educational materials and a website for learners of English Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Parsing Software Requirements with an Ontology-based Semantic Role Labeler Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software

More information

Regulations regarding health requirements for persons working on installations in petroleum activities offshore

Regulations regarding health requirements for persons working on installations in petroleum activities offshore Unauthorized translation of the Norwegian FOR 2010-12-20 nr 1780: Forskrift om helsekrav for personer I arbeid på innretninger I petroleumsvirksomheten til havs Regulations regarding health requirements

More information

Norwegian hospital planning tools

Norwegian hospital planning tools Norwegian hospital planning tools Knut Bergsland SINTEF Health Research Espoo, Dec.1.2006 1 SINTEF Health Research Research for improved health and a better quality of life 2 SINTEF Health Research Organisational

More information