Software = hard for national termbanks?



Similar documents
All in or not? Some methodological aspects and problems of creating and maintaining a national termbank

Vesna Lušicky & Tanja Wissik. Translating and the Computer Conference

Tanja Wissik. COTSOES Terminology and Documentation Working Group Meeting, 13 May 2013, Stockholm

Survey on digital publishing rights for literary translators in Europe

Development of an Ontology for the Document Management Systems for Construction

EUROPEAN CITIZENS DIGITAL HEALTH LITERACY

European Research Council

COMPANIES ENGAGED IN ONLINE ACTIVITIES

User language preferences online. Analytical report

Draft guidelines and measures to improve ICT procurement. Survey results

ENTERING THE EU BORDERS & VISAS THE SCHENGEN AREA OF FREE MOVEMENT. EU Schengen States. Non-Schengen EU States. Non-EU Schengen States.

Minimum Wage Protection Current German and European Debates

The Future European Constitution

INNOBAROMETER THE INNOVATION TRENDS AT EU ENTERPRISES

Our patent and trade mark attorneys are here to help you protect and profit from your ideas, making sure they re working every bit as hard as you do.

Open Source Software and the Public Sector

International Organization for Standardization TC 215 Health Informatics. Audrey Dickerson, RN MS ISO/TC 215 Secretary

DRAFT ÖNORM EN

INVESTING IN INTANGIBLES: ECONOMIC ASSETS AND INNOVATION DRIVERS FOR GROWTH

Government at a Glance 2015

TOWARDS PUBLIC PROCUREMENT KEY PERFORMANCE INDICATORS. Paulo Magina Public Sector Integrity Division

4/17/2015. Health Insurance. The Framework. The importance of health care. the role of government, and reasons for the costs increase

ARE THE POINTS OF SINGLE CONTACT TRULY MAKING THINGS EASIER FOR EUROPEAN COMPANIES?

ANTILOPE - HOW TO REACH INTEROPERABILITY IN E-HEALTH

Study on comparison tools and third-party verification schemes

Finland must take a leap towards new innovations

8 th European Quality Conference. Draft Programme. 1 st & 2 nd October 2015 Esch-Belval, Luxembourg

How many students study abroad and where do they go?

INTRODUCTION I. Participation in the 2014 European elections... 3

IMPEL. European Union network for the Implementation and Enforcement of Environmental Law

This document is a preview generated by EVS

41 T Korea, Rep T Netherlands T Japan E Bulgaria T Argentina T Czech Republic T Greece 50.

European judicial training Justice

European Research Council

M3039 MPEG 97/ January 1998

ERMInE Database. Presentation by Nils Flatabø SINTEF Energy Research. ERMInE Workshop 2 - Northern Europe Oslo, 1. November 2006

SEPA. Changes in the Payment System Implementation of the European SEPA Regulations for Kuna and Euro Payments

Leaseurope Biannual Survey 2014 Table of Contents

Country specific notes on municipal waste data

International Compliance

I have asked for asylum in the EU which country will handle my claim?

International Call Services

Keeping European Consumers safe Rapid Alert System for dangerous non-food products 2014

INNOVATION IN THE PUBLIC SECTOR: ITS PERCEPTION IN AND IMPACT ON BUSINESS

Statewatch Briefing ID Cards in the EU: Current state of play

EUROPEAN AREA OF SKILLS AND QUALIFICATIONS

Europeans and their Languages

THE ROLE OF PUBLIC SUPPORT IN THE COMMERCIALISATION OF INNOVATIONS

The Community Innovation Survey 2010 (CIS 2010)

HOW COMPANIES INFLUENCE OUR SOCIETY: CITIZENS VIEW

1. Perception of the Bancruptcy System Perception of In-court Reorganisation... 4

Milan Zoric ETSI

TOYOTA I_SITE More than fleet management

1a. Total Leaseurope Leasing Market 2012

CO2 BASED MOTOR VEHICLE TAXES IN THE EU IN 2015

EUROPEAN YOUTH: PARTICIPATION IN DEMOCRATIC LIFE

ÖNORM EN The European Standard EN has the status of an Austrian Standard. Edition: Standards group B

COMMUNICATION FROM THE COMMISSION

BEST PRACTICES/ TRENDS/ TO-DOS

Dwelling prices, total. Apartment prices. House prices. Net wages

If You Get Sick during a Temporary Stay Abroad [Sjuk vid tillfällig vistelse utomlands]

BUSINESS-TO-BUSINESS ALTERNATIVE DISPUTE RESOLUTION IN THE EU

DOCTORAL (Ph.D) THESIS

SMES, RESOURCE EFFICIENCY AND GREEN MARKETS

Harmonizing Change Control Processes Globally

EMN Ad-Hoc Query on Statistical tools, organisational needs and best practices regarding statistics

International transfers are not always easy to understand.

Labour Force Survey 2014 Almost 10 million part-time workers in the EU would have preferred to work more Two-thirds were women

TRANSFoRm: Vision of a learning healthcare system

We decided that we would build IFS Applications on standards so our customers would not be locked into any particular technology. We still do.

Reporting practices for domestic and total debt securities

FP7 Space Research Proposal evaluation and role of the REA European Commission REA S2 Space Research

Microsoft Dynamics AX Update and Roadmap James Page & Guy Orridge. 10 August 2011

ONR CEN/TS Security Requirements for Trustworthy Systems Supporting Server Signing (prcen/ts :2013) DRAFT ICS

PUBLIC ATTITUDES TOWARDS ROBOTS

Katy Taylor, European Coalition to End Animal Experiments (ECEAE), London, UK

SURVEY ON THE TRAINING OF GENERAL CARE NURSES IN THE EUROPEAN UNION. The current minimum training requirements for general care nurses

ICT MICRODATA LINKING PROJECTS. Brian Ring Central Statistics Office

A European Unemployment Insurance Scheme

STAR Semantic Technologies for Archaeological Resources.

Internationalization and higher education policy: Recent developments in Finland

Michael Thomson BEDA President and Director, Design Connect, London

13 th Economic Trends Survey of the Architects Council of Europe

ERASMUS+ MASTER LOANS

Quality of Drinking Water in the EU

W I R E T R A N S F E R S E R V I C E S

72/ April 2015

EUROPE 2020 TARGET: EARLY LEAVERS FROM EDUCATION AND TRAINING

CROSS-BORDER ACCESS TO ONLINE CONTENT

Health care in Sweden for foreign students [Sjukvård i Sverige för utländska studenter]

RETAILERS ATTITUDES TOWARDS CROSS- BORDER TRADE AND CONSUMER PROTECTION

PORTABILITY OF SOCIAL SECURITY AND HEALTH CARE BENEFITS IN THE UNITED KINGDOM

LISE Legal Language Interoperability Services

Public Domain Names And Their Importance In 2012

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

T 2 O Recycling Thesauri into a Multilingual Ontology

ICT PSP Call Aleksandra Wesolowska. Theme 6: Multilingual Web.

Schengen routing or Schengen encryption?

Broadband Coverage in Europe Final Report 2009 Survey Data as of 31 December DG INFSO December 2009 IDATE 1

Transcription:

Software = hard for national termbanks? Henrik Nilsson & Sandra Cuadrado í Camps Terminologicentrum TNC & Termcat IITF Colloquium Vienna, Austria 9 July 2015

Outline National termbank The concept and some examples Rikstermbanken Cercaterm State-of-the-art (TERMINTRA) Aspects and related technical challenges getting (and presenting) content harmonizing content users digital age reuse getting funding

National could imply a government responsibility and financing a link to a national terminology centre a basis in the national conceptual world a certain language choice (monolingual, only national languages ) a certain quality a certain accessibility (free of charge, adapted) a certain scope (e.g. cover all terminology in the nation, nothing foreign etc.) a certain status (affecting usage) a marketing gimmick

National should imply a certain coverage (as to contents) a certain status (acknowledged by professionals and a language or terminology institution) accessibility (open and freed of ownership claims) [Termintra, Oslo, 2012]

national terminology database database containing mono- or multilingual terminological data [ ] established at country level [Guidelines for Terminology Policies, Unesco] the national termbank, which attempts to serve a general purpose role in coordinating the creation and use of terminologies within a country, and hence is theoretically multifunctional, multilingual and exploited by widely differing kinds of users [McNaught, 1987]

Why a national term bank? I have been a manager [ ] within the U.S. Federal Government for over 30 years. In that time, I have observed that the dominant case of ineffectiveness, inefficiency, and unreponsiveness in operations is the inconsistent terms used across the various boundaries of government, their contractors, industry, non-profits, and citizens. There are terminology boundaries between locations, organizations, offices within the organizations, work functions, processes, resources (e.g. people, intelligence, funds, skills, materiel, facilities, services), and capability requirements (e.g. missions, information systems). [Roebuck, 2009]

Next, the vocabulary of these functions would be automatically collected, organized, and placed into a National Terminology database to enable integration, interoperability, unification, and federation of operations technical challenges!? [Roebuck, 2009]

European national termbanks Stofnun Árna Magnússonar í íslenskum fræðum, Iceland: Orðabanki Foras na Gaeilge, Ireland: Téarma.ie Norway, : Termportalen, Snorre NL-Term, Nederländerna: Nedterm TNC, Sweden: Rikstermbanken TSK, Finland: Vetenskapstermbanken, TEPA, Valter Eter, Estonia: ESTERM Latvia: EuroTermBank LKI, Lithuania: Terminų bankas Wales: National Terminology Portal Société française de terminologie, Confédération suisse France: Termdat Slovenia: FranceTerme Evroterm Croatia: UZEI, Basque country: National Terminology Portal Euskalterm (incl. Struna) Termcat Cercaterm Dernmark, : (DTB) Türk Dil Kurumu, Turkey: Bilim ve Sanat Terimleri

Struna (CR)

FranceTerme (FR)

Terminų Bankas (LT)

BFT (FI)

National Terminology Portal (Wales)

Risten (Sápmi)

Orðabanki (ISL)

Téarma.ie (IRL)

Slovenská terminologicka databáza (SK)

AkadTerm (LV)

Euskalterm (Basque Country)

Türk Diril Kurumu (TR)

Terminoģijas portāls (LV)

Nedterm (NL)

Other termbanks EuroTermBank National Termbank (RSA) IATE ISO Online Browsing Platform UNTERM EAA Glossary Electropedia METEOTERM ILOTERM FAOTERM

EuroTermBank

IATE

www.rikstermbanken.se

Background The fast development of society requires constant work on creating and making accessible agreed-upon terminologies, within more and more subject fields. An easy access to terms via the Internet in a national termbank [rikstermbank] endorses such a development. TISS, 2002 2004 Nordterm-Net, 1999; Brussels Declaration, 2002 et al. IT-propositionen, (Prop. 2004/05:175), 2005 Bästa språket (Prop. 2005/06:2), 2005 Grant from Ministry of Industry, Employment and Communications: 2005: 1 500 000 SEK; 2007: 750 000 SEK, 2009: 0; 2011: discussion about semantic resource! IATE, EU; evaluation 2004 the establishment of a national central term bank, a rikstermbank, is a prerequisite for an easy access to, and quality assurance of, Swedish terms in all domains. Terminų Bankas, Lithuania & EuroTermBank

Rikstermbanken as a tool for storage for search and retrieval for terminology work, research

Rikstermbanken should mainly reflect concepts of the Swedish society; however, this does not mean that the termbank would comprise only Swedish terms. In order to make it function in the way it is planned, the termbank should also contain term equivalents in foreign languages, and not only in English but also in various immigrant languages and in the official minority languages of Sweden. [IT-propositionen, prop 2005/06:175]

Current contents no limitations as to domains! Swedish conceptual world = starting point complete glossaries, but also parts of documents and excerpts some digitalizated material quality control by terminologists (and at times the supplier) presentation phase consolidation phase overview harmonisation

Rikstermbanken in numbers 106 000 term records 300 000 terms (incl. look up-terms, synonyms, equivalents) 28 languages 71 % definitions (in Swedish) ca 1500 unique sources ca 500 suppliers

Contents priorities selection, types preparation (enhancing, record making & breaking) harmonization (doublettes ) updating addition of new material quality quantity?

Preparation of the material termbank adaptation (reformating according to NTRF-RTB, exclusion of remaining book-related aspects) selection changes for consistency linguistic and content-related adjustments (incl. removal of target group adaptations) discussion with suppliers illustrations semi-automatic three-step import control tool

Technology experience from Termdok development and Nordterm-Net (MLIS-project) comparisons to existing TMS-software and standards (ISO, LISA et al) IATE evaluation co-operation with IATE, EuroTermBank proper software open source: Lucene, Mysql, Tomcat, Java

Technical development Rikstermbanken Oracle replaced by open source: Mysql (database management) Tomcat (web server) Lucene (indexing) Java applications Iterative process Documentation via internal wiki

Cercaterm (CAT)

Cercaterm online platform designed, supported and updated by Termcat (since 2000) development of terminological products, terminology standardisation, terminology consulting service updates to Cercaterm Termcat s terminology production, standardized terminology, queries resolved + other material 230 000 files (more than 925 000 denominations) new functions in 2010 (based on user survey): search, sources 3 million visitis in 2014 also other information

Cercaterm (CAT)

Cercaterm (CAT)

Cercaterm (CAT)

TERMINTRA Forum for discussion on national termbanks The concept of national termbank Aspects: General, Contents, Users, Funding, Organization, Technology First seminar in Oslo 2012, second in Zagreb 2013 Participants from Catalonia, Croatia, Denmark, Finland, France, Ireland, Iceland, Latvia, Norway, Sápmi, Sweden, Switzerland, Wales

TERMINTRA: Technology What technical solutions are in use today, and are some more appropriate than others? Should a national term bank be based on a distributed solution or not? Or, rather, constitute a kind of portal? Pros and cons? What standards should be the basis for national terminology databases (storage and exchange formats, etc.)? Are the current terminology management systems suitable for the demands which could be made on a national term bank? To what extent are today s national terminology banks based on proprietary software (use of open source or not)?

The current situation is that most of the bigger existing term banks use purpose-built software, although there are cases where general purpose information retrieval software is used. Although computerized term banks have been in existence for a number of years, there seems to be little agreement as to how they should operate, and if the present situation persists, their use will continue to be low. If term banks are to become widely used certain changes in practice will be necessary; changes which in turn have implications for the software that must be used for term bank operation. [Negus, 1979]

the longer established term banks tend to use purpose built software, partly because nothing generally available at the time was found to be suitable, and partly because each is aimed at providing a range of services not found elsewhere, using terminological records and searching methods which are more or less unique. [ ] all systems should attempt to maintain the greatest flexibility in their approach. However, this is difficult to achieve where specially created software is concerned; there is an inevitable tendency to provide what is definitely required at the time of program specification, perhaps giving little thought to what services might be required, or facilities demanded, at some indeterminate time in the future. [Negus, 1979]

As to the technological aspects of national termbanks, it became clear during the presentations and discussions that most of the represented termbanks had developed their own technical solution (which, however, in many cases relied on international standards). The exception was the Finnish termbank using Wiki-technology and open source software. [Proceedings, TERMINTRA I, 2013]

Perspective Aspect Contents Technology Organisation Manager X X X Users X X (X) Suppliers X X (X) Financing bodies (X) X (X)

Challenge: getting content term extraction as part of software (or separate)? automatic record breaking into data categories (definition indicators etc.)? And record making? automatically fill in the gaps? (automatic classification)

Various sources [Heid (1991) in Martin & van der Vliet, 2003]

Import process (of glossaries) 1. inventory (weekly) & preliminary assessment 2. formal inquiry 3. collection 4. formatting 5. review 6. (feedback) 7. first import 8. adjustments 9. second import 10.updating

Term bank contents: challenges Selection: all or nothing or a little? Interpretation of contents, decontextualisation Term choice (variants, synonyms etc.) Definition vs. explanation Updating vs archiving consistency changes? Decustomization (= depersonalisation) Record breaking & record making Document types: legal documents

Record breaking (1) Before After svte offset svdf litografisk plantryckmetod där tryckplåten är preparerad så att färggivande ytor gjorts färgmottagliga och vattenbortstötande och icke färggivande partier gjorts vattenmottagliga och färgbortstötande svrete litografi, djuptryck, direktlito svan Överföringen av tryckbilden från offsetplåten sker indirekt via en gummiduk till papperet.

Record breaking (2) Before After svte incidens HONR 1 svfk Antalet fall av en viss sjukdom som uppträder i en befolkning under viss tid; anges t.ex. som antalet diagnoser per 1 000 invånare per år. svte incidens HONR 2 svupte incidenskvot svfk Antalet av en viss studerad händelse i en klinisk prövning eller kohortundersökning, dividerat med antalet deltagare i gruppen. Graden av skillnad mellan två gruppers incidenstal kan uttryckas genom att det ena divideras med det andra till en incidenskvot. svrete händelse

Challenge: getting content term extraction as part of software (or separate)? automatic record breaking into data categories (definition indicators etc.)? And record making? automatically fill in the gaps? (automatic classification) mirroring (QA?) or double storage (updating)?

Distributed or not? All terms in one place + consistency + control + not many other termbanks around + pragmatic: simpler at the time, traditional double storage updating needs administration of contributors higher technology demands on contributors

Challenge: presenting content automatic compounding of term records visualization (ontologies etc.)

bagværk konfekt? tærte brød kage mørdejstærte butterdejstærte kage for 1 person kage for > 1 person gærkage flødekage, flødeskumskage lagkage? skærekage kaffebrød? sandkage tørkage, fin kage småkage bagt kage creme frugt gulerodskage kiksbasered bund genoisebund bavarois vandbakkelse marengsbund lagkagebund vaniljecreme

Challenge: harmonizing content signalize various statuses ( primaries ) automatic handling of doublettes automatic calculation of definition similiarity? version management automatic updating of content automatic notification of updating (to users, of existing links etc.)

From presentation to consolidation Amount of content need one accepted definition of a concept Time

User survey 16. If your search for a particular term generated several hits, what do you think about that? Good Bad No opinion 84,3 % (172) 2,0 % (4) 13,7 % (28) 27 skipped question 17 comments

Resource harmonisation on a national level: Rikstermbanken background & perspectives & user survey content revision harmonisation within a source definition explanation harmonisation between sources (i.e. within the termbank as a whole) doublettes problems and solutions content presentation content updating

Harmonisation: problems Within and between sources Definition vs explanations choice? Certitude of domain? Breaking of conceptual whole, break in macro and micro structures Role of publication date Homonyms, synonyms Degree (%) of similarity between definitions? Handling of diverging interests (be shown disappear etc.) Different sources for different data categories indication of doublettes or problem?

Harmonisation: within a source often semasiological presentation redundancy (e.g. synonyms in separate records) choice of definition or explanation with respect to macrostructure (crossreferences etc.) homonyms

Harmonisation: between sources (automatic) removal of absolute doublettes (but other information, other languages etc.?) limit (%) of definition similarity calculation? combination of several sources in one record instead? several organizations using the same definition is in itself an interesting piece of information special marking in hit list? source respect? issues?

National term bank [ ] a large, general term bank to serve an entire nation. Such a bank would satisfy the needs of users with a variety of tasks, of prior knowledge, of organisational adherence, or of requirements for a specific product. [Åström, 1987]

Challenge: users satisfy all user groups? measures of usability?

for a successful operation of a term bank, today s imperative is reaching out for the user and delivering the required content, wherever it may reside, with the method and in the format required by the user. The area of user participation and interaction is identified [ ] as yet to be successfully integrated in the design of terminology portals. [Vasiljevs, Rirdance and Gornostay, 2010]

User adaptation!? = Important for terminology products! But: sometimes over-estimated, esp. concerning human users and layout of term banks? Demand, frequency of usage vs development costs?

Challenge: digital age crowdsourcing nichesourcing wiki-technology voting procedures moderating functionalities access rights, roles and responsibilities etc. new administrator interfaces etc. usage on new devices (tablets, phones etc.) app

Critics Crowdsourcing killed indie rock cause crowds have terrible taste. [Weingarten in Keats, 2011] government needs smart-sourcing, not crowdsourcing. [Peterson in Keats, 2011] Collectively based lexicography is often regarded with scepticism by professional lexicographers since anyone can contribute anything and there s no possibility to keep the quality level of the contributions under control. This way of working has even been described as a potential danger to all serious lexicography since these dictionaries risk disturbing the trust in the two qualities that users generally associate with professionally produced dictionaries: quality and reliability. [Doherty in Svensén, 2004]

Challenge: reuse linked open data etc. APIs, URIs web tracking version management? thematic portals integration, plug-ins CAT, Word etc. federations ( issues)

semantic resource Semantic Resource [ ] refers to all ontologysimilar entities, such as taxonomies, dictionaries, thesauri, etc. (Lima et al, 2010?) Fackverket 3.0 linked open data banisters TNC, Wikimedia, Bobitek funded by Swedish Agency for Innovation Systems aims: enhance use of linked open terminologies by co-ordinating and further develop existing resources and tools

Challenge: getting funding few existing national termbanks use OTS not good enough? (evalutation criteria?, new demands?) easier to obtain funding if you develop your own software?

What will be the needs of linguistic data bank users in the future? These can of course vary to a large extent, but I believe that the ones we should pay attention to are the simple, down-to-earth requests, which can be summed up under the following keywords: simplicity, quality and service. [Åström, 1982]

Links henrik.nilsson@tnc.se TNC: www.tnc.se Rikstermbanken: www.rikstermbanken.se scuadrado@termcat.cat Termcat: http://www.termcat.cat/ Cercaterm: http://www.termcat.cat/ca/cercaterm/fitxes/