1 Software = hard for national termbanks? Henrik Nilsson & Sandra Cuadrado í Camps Terminologicentrum TNC & Termcat IITF Colloquium Vienna, Austria 9 July 2015
2 Outline National termbank The concept and some examples Rikstermbanken Cercaterm State-of-the-art (TERMINTRA) Aspects and related technical challenges getting (and presenting) content harmonizing content users digital age reuse getting funding
3 National could imply a government responsibility and financing a link to a national terminology centre a basis in the national conceptual world a certain language choice (monolingual, only national languages ) a certain quality a certain accessibility (free of charge, adapted) a certain scope (e.g. cover all terminology in the nation, nothing foreign etc.) a certain status (affecting usage) a marketing gimmick
4 National should imply a certain coverage (as to contents) a certain status (acknowledged by professionals and a language or terminology institution) accessibility (open and freed of ownership claims) [Termintra, Oslo, 2012]
5 national terminology database database containing mono- or multilingual terminological data [ ] established at country level [Guidelines for Terminology Policies, Unesco] the national termbank, which attempts to serve a general purpose role in coordinating the creation and use of terminologies within a country, and hence is theoretically multifunctional, multilingual and exploited by widely differing kinds of users [McNaught, 1987]
6 Why a national term bank? I have been a manager [ ] within the U.S. Federal Government for over 30 years. In that time, I have observed that the dominant case of ineffectiveness, inefficiency, and unreponsiveness in operations is the inconsistent terms used across the various boundaries of government, their contractors, industry, non-profits, and citizens. There are terminology boundaries between locations, organizations, offices within the organizations, work functions, processes, resources (e.g. people, intelligence, funds, skills, materiel, facilities, services), and capability requirements (e.g. missions, information systems). [Roebuck, 2009]
7 Next, the vocabulary of these functions would be automatically collected, organized, and placed into a National Terminology database to enable integration, interoperability, unification, and federation of operations technical challenges!? [Roebuck, 2009]
8 European national termbanks Stofnun Árna Magnússonar í íslenskum fræðum, Iceland: Orðabanki Foras na Gaeilge, Ireland: Téarma.ie Norway, : Termportalen, Snorre NL-Term, Nederländerna: Nedterm TNC, Sweden: Rikstermbanken TSK, Finland: Vetenskapstermbanken, TEPA, Valter Eter, Estonia: ESTERM Latvia: EuroTermBank LKI, Lithuania: Terminų bankas Wales: National Terminology Portal Société française de terminologie, Confédération suisse France: Termdat Slovenia: FranceTerme Evroterm Croatia: UZEI, Basque country: National Terminology Portal Euskalterm (incl. Struna) Termcat Cercaterm Dernmark, : (DTB) Türk Dil Kurumu, Turkey: Bilim ve Sanat Terimleri
9 Struna (CR)
10 FranceTerme (FR)
11 Terminų Bankas (LT)
12 BFT (FI)
13 National Terminology Portal (Wales)
14 Risten (Sápmi)
15 Orðabanki (ISL)
16 Téarma.ie (IRL)
17 Slovenská terminologicka databáza (SK)
18 AkadTerm (LV)
20 Euskalterm (Basque Country)
21 Türk Diril Kurumu (TR)
22 Terminoģijas portāls (LV)
23 Nedterm (NL)
24 Other termbanks EuroTermBank National Termbank (RSA) IATE ISO Online Browsing Platform UNTERM EAA Glossary Electropedia METEOTERM ILOTERM FAOTERM
28 Background The fast development of society requires constant work on creating and making accessible agreed-upon terminologies, within more and more subject fields. An easy access to terms via the Internet in a national termbank [rikstermbank] endorses such a development. TISS, Nordterm-Net, 1999; Brussels Declaration, 2002 et al. IT-propositionen, (Prop. 2004/05:175), 2005 Bästa språket (Prop. 2005/06:2), 2005 Grant from Ministry of Industry, Employment and Communications: 2005: SEK; 2007: SEK, 2009: 0; 2011: discussion about semantic resource! IATE, EU; evaluation 2004 the establishment of a national central term bank, a rikstermbank, is a prerequisite for an easy access to, and quality assurance of, Swedish terms in all domains. Terminų Bankas, Lithuania & EuroTermBank
29 Rikstermbanken as a tool for storage for search and retrieval for terminology work, research
30 Rikstermbanken should mainly reflect concepts of the Swedish society; however, this does not mean that the termbank would comprise only Swedish terms. In order to make it function in the way it is planned, the termbank should also contain term equivalents in foreign languages, and not only in English but also in various immigrant languages and in the official minority languages of Sweden. [IT-propositionen, prop 2005/06:175]
31 Current contents no limitations as to domains! Swedish conceptual world = starting point complete glossaries, but also parts of documents and excerpts some digitalizated material quality control by terminologists (and at times the supplier) presentation phase consolidation phase overview harmonisation
32 Rikstermbanken in numbers term records terms (incl. look up-terms, synonyms, equivalents) 28 languages 71 % definitions (in Swedish) ca 1500 unique sources ca 500 suppliers
33 Contents priorities selection, types preparation (enhancing, record making & breaking) harmonization (doublettes ) updating addition of new material quality quantity?
34 Preparation of the material termbank adaptation (reformating according to NTRF-RTB, exclusion of remaining book-related aspects) selection changes for consistency linguistic and content-related adjustments (incl. removal of target group adaptations) discussion with suppliers illustrations semi-automatic three-step import control tool
35 Technology experience from Termdok development and Nordterm-Net (MLIS-project) comparisons to existing TMS-software and standards (ISO, LISA et al) IATE evaluation co-operation with IATE, EuroTermBank proper software open source: Lucene, Mysql, Tomcat, Java
36 Technical development Rikstermbanken Oracle replaced by open source: Mysql (database management) Tomcat (web server) Lucene (indexing) Java applications Iterative process Documentation via internal wiki
37 Cercaterm (CAT)
38 Cercaterm online platform designed, supported and updated by Termcat (since 2000) development of terminological products, terminology standardisation, terminology consulting service updates to Cercaterm Termcat s terminology production, standardized terminology, queries resolved + other material files (more than denominations) new functions in 2010 (based on user survey): search, sources 3 million visitis in 2014 also other information
39 Cercaterm (CAT)
40 Cercaterm (CAT)
41 Cercaterm (CAT)
42 TERMINTRA Forum for discussion on national termbanks The concept of national termbank Aspects: General, Contents, Users, Funding, Organization, Technology First seminar in Oslo 2012, second in Zagreb 2013 Participants from Catalonia, Croatia, Denmark, Finland, France, Ireland, Iceland, Latvia, Norway, Sápmi, Sweden, Switzerland, Wales
43 TERMINTRA: Technology What technical solutions are in use today, and are some more appropriate than others? Should a national term bank be based on a distributed solution or not? Or, rather, constitute a kind of portal? Pros and cons? What standards should be the basis for national terminology databases (storage and exchange formats, etc.)? Are the current terminology management systems suitable for the demands which could be made on a national term bank? To what extent are today s national terminology banks based on proprietary software (use of open source or not)?
44 The current situation is that most of the bigger existing term banks use purpose-built software, although there are cases where general purpose information retrieval software is used. Although computerized term banks have been in existence for a number of years, there seems to be little agreement as to how they should operate, and if the present situation persists, their use will continue to be low. If term banks are to become widely used certain changes in practice will be necessary; changes which in turn have implications for the software that must be used for term bank operation. [Negus, 1979]
45 the longer established term banks tend to use purpose built software, partly because nothing generally available at the time was found to be suitable, and partly because each is aimed at providing a range of services not found elsewhere, using terminological records and searching methods which are more or less unique. [ ] all systems should attempt to maintain the greatest flexibility in their approach. However, this is difficult to achieve where specially created software is concerned; there is an inevitable tendency to provide what is definitely required at the time of program specification, perhaps giving little thought to what services might be required, or facilities demanded, at some indeterminate time in the future. [Negus, 1979]
46 As to the technological aspects of national termbanks, it became clear during the presentations and discussions that most of the represented termbanks had developed their own technical solution (which, however, in many cases relied on international standards). The exception was the Finnish termbank using Wiki-technology and open source software. [Proceedings, TERMINTRA I, 2013]
47 Perspective Aspect Contents Technology Organisation Manager X X X Users X X (X) Suppliers X X (X) Financing bodies (X) X (X)
48 Challenge: getting content term extraction as part of software (or separate)? automatic record breaking into data categories (definition indicators etc.)? And record making? automatically fill in the gaps? (automatic classification)
49 Various sources [Heid (1991) in Martin & van der Vliet, 2003]
51 Term bank contents: challenges Selection: all or nothing or a little? Interpretation of contents, decontextualisation Term choice (variants, synonyms etc.) Definition vs. explanation Updating vs archiving consistency changes? Decustomization (= depersonalisation) Record breaking & record making Document types: legal documents
52 Record breaking (1) Before After svte offset svdf litografisk plantryckmetod där tryckplåten är preparerad så att färggivande ytor gjorts färgmottagliga och vattenbortstötande och icke färggivande partier gjorts vattenmottagliga och färgbortstötande svrete litografi, djuptryck, direktlito svan Överföringen av tryckbilden från offsetplåten sker indirekt via en gummiduk till papperet.
53 Record breaking (2) Before After svte incidens HONR 1 svfk Antalet fall av en viss sjukdom som uppträder i en befolkning under viss tid; anges t.ex. som antalet diagnoser per invånare per år. svte incidens HONR 2 svupte incidenskvot svfk Antalet av en viss studerad händelse i en klinisk prövning eller kohortundersökning, dividerat med antalet deltagare i gruppen. Graden av skillnad mellan två gruppers incidenstal kan uttryckas genom att det ena divideras med det andra till en incidenskvot. svrete händelse
54 Challenge: getting content term extraction as part of software (or separate)? automatic record breaking into data categories (definition indicators etc.)? And record making? automatically fill in the gaps? (automatic classification) mirroring (QA?) or double storage (updating)?
55 Distributed or not? All terms in one place + consistency + control + not many other termbanks around + pragmatic: simpler at the time, traditional double storage updating needs administration of contributors higher technology demands on contributors
56 Challenge: presenting content automatic compounding of term records visualization (ontologies etc.)
57 bagværk konfekt? tærte brød kage mørdejstærte butterdejstærte kage for 1 person kage for > 1 person gærkage flødekage, flødeskumskage lagkage? skærekage kaffebrød? sandkage tørkage, fin kage småkage bagt kage creme frugt gulerodskage kiksbasered bund genoisebund bavarois vandbakkelse marengsbund lagkagebund vaniljecreme
59 Challenge: harmonizing content signalize various statuses ( primaries ) automatic handling of doublettes automatic calculation of definition similiarity? version management automatic updating of content automatic notification of updating (to users, of existing links etc.)
60 From presentation to consolidation Amount of content need one accepted definition of a concept Time
61 User survey 16. If your search for a particular term generated several hits, what do you think about that? Good Bad No opinion 84,3 % (172) 2,0 % (4) 13,7 % (28) 27 skipped question 17 comments
62 Resource harmonisation on a national level: Rikstermbanken background & perspectives & user survey content revision harmonisation within a source definition explanation harmonisation between sources (i.e. within the termbank as a whole) doublettes problems and solutions content presentation content updating
63 Harmonisation: problems Within and between sources Definition vs explanations choice? Certitude of domain? Breaking of conceptual whole, break in macro and micro structures Role of publication date Homonyms, synonyms Degree (%) of similarity between definitions? Handling of diverging interests (be shown disappear etc.) Different sources for different data categories indication of doublettes or problem?
64 Harmonisation: within a source often semasiological presentation redundancy (e.g. synonyms in separate records) choice of definition or explanation with respect to macrostructure (crossreferences etc.) homonyms
65 Harmonisation: between sources (automatic) removal of absolute doublettes (but other information, other languages etc.?) limit (%) of definition similarity calculation? combination of several sources in one record instead? several organizations using the same definition is in itself an interesting piece of information special marking in hit list? source respect? issues?
66 National term bank [ ] a large, general term bank to serve an entire nation. Such a bank would satisfy the needs of users with a variety of tasks, of prior knowledge, of organisational adherence, or of requirements for a specific product. [Åström, 1987]
67 Challenge: users satisfy all user groups? measures of usability?
68 for a successful operation of a term bank, today s imperative is reaching out for the user and delivering the required content, wherever it may reside, with the method and in the format required by the user. The area of user participation and interaction is identified [ ] as yet to be successfully integrated in the design of terminology portals. [Vasiljevs, Rirdance and Gornostay, 2010]
69 User adaptation!? = Important for terminology products! But: sometimes over-estimated, esp. concerning human users and layout of term banks? Demand, frequency of usage vs development costs?
70 Challenge: digital age crowdsourcing nichesourcing wiki-technology voting procedures moderating functionalities access rights, roles and responsibilities etc. new administrator interfaces etc. usage on new devices (tablets, phones etc.) app
71 Critics Crowdsourcing killed indie rock cause crowds have terrible taste. [Weingarten in Keats, 2011] government needs smart-sourcing, not crowdsourcing. [Peterson in Keats, 2011] Collectively based lexicography is often regarded with scepticism by professional lexicographers since anyone can contribute anything and there s no possibility to keep the quality level of the contributions under control. This way of working has even been described as a potential danger to all serious lexicography since these dictionaries risk disturbing the trust in the two qualities that users generally associate with professionally produced dictionaries: quality and reliability. [Doherty in Svensén, 2004]
72 Challenge: reuse linked open data etc. APIs, URIs web tracking version management? thematic portals integration, plug-ins CAT, Word etc. federations ( issues)
73 semantic resource Semantic Resource [ ] refers to all ontologysimilar entities, such as taxonomies, dictionaries, thesauri, etc. (Lima et al, 2010?) Fackverket 3.0 linked open data banisters TNC, Wikimedia, Bobitek funded by Swedish Agency for Innovation Systems aims: enhance use of linked open terminologies by co-ordinating and further develop existing resources and tools
74 Challenge: getting funding few existing national termbanks use OTS not good enough? (evalutation criteria?, new demands?) easier to obtain funding if you develop your own software?
75 What will be the needs of linguistic data bank users in the future? These can of course vary to a large extent, but I believe that the ones we should pay attention to are the simple, down-to-earth requests, which can be summed up under the following keywords: simplicity, quality and service. [Åström, 1982]
Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Generic Statistical Business Process Model Version 4.0 April 2009 Prepared by the UNECE Secretariat 1 I. Background 1. The Joint UNECE
Trends in European Research Infrastructures Analysis of data from the 26/7 survey European Commission European Science Foundation Report July 27 Information contained in this report is also available online
PROJECT FINAL REPORT Grant Agreement number: 212117 Project acronym: FUTUREFARM Project title: FUTUREFARM-Integration of Farm Management Information Systems to support real-time management decisions and
WHAT ARE THE CHARACTERISTICS OF RECORDS? LENA-MARIA ÖBERG 1 Department of Information Technology and Media Mid Sweden University Sweden email@example.com ERIK BORGLUND 1 Department of Information
The Twining project, an institutional cooperation between Italy and Turkey, is co-financed by the European Union and the Republic of Turkey. EU TWINNING PROJECT Improving Data Quality in Public Accounts
Cloud Service Level Agreement Standardisation Guidelines Brussels 24/06/2014 1 Table of Contents Preamble... 4 1. Principles for the development of Service Level Agreement Standards for Cloud Computing...
A Suggested Framework for the Quality of Big Data Deliverables of the UNECE Big Data Quality Task Team December, 2014 Contents 1. Executive Summary... 3 2. Background... 5 3. Introduction... 7 4. Principles...
FRAMEWORK FOR A SET OF E GOVERNMENT CORE INDICATORS December 2011 ESCWA Preface Globally comparative e government indicators can assist users to understand the status of e government, both nationally and
Overview of the national laws on electronic health records in the EU Member States and their interaction with the provision of cross-border ehealth services Final report and recommendations Type Contract
METADATA STANDARDS AND METADATA REGISTRIES: AN OVERVIEW Bruce E. Bargmeyer, Environmental Protection Agency, and Daniel W. Gillman, Bureau of Labor Statistics Daniel W. Gillman, Bureau of Labor Statistics,
Asset Management for the Roads Sector «TRANSPORT OECD, 2001. Software: 1987-1996, Acrobat is a trademark of ADOBE. All rights reserved. OECD grants you the right to use one copy of this Program for your
QUIS - Quality, Interoperability and Standards in e-learning 2004-3538/001-001 ELE - ELEB14 Cost Effectiveness and Cost Efficiency in E-learning Authored by the QUIS team. Contact authors: Tor Atle Hjeltnes,
DELIVERABLE Project Acronym: Ev3 Grant Agreement number: 620484 Project Title: Europeana Version 3 D1.1: RECOMMENDATIONS TO IMPROVE AGGREGATION INFRASTRUCTURE Revision 1.0 Date of submission 3 March 2015
Please cite this paper as: OECD (2014), Cloud Computing: The Concept, Impacts and the Role of Government Policy, OECD Digital Economy Papers, No. 240, OECD Publishing. http://dx.doi.org/10.1787/5jxzf4lcc7f5-en
Dossier A s s o c i a t i o n f o r I n t e r n a t i o n a l a n d C o m p a r a t i v e S t u d i e s i n L a b o u r L a w a n d I n d u s t r i a l R e l a t i o n s In collaboration with the Marco
An introduction to Service Integration and Management and ITIL Kevin Holland AXELOS.com White Paper January 2015 Contents Foreword 3 Introduction 4 Models for SIAM 7 Principles and considerations 9 The
CLOUD COMPUTING IN THE VICTORIAN PUBLIC SECTOR Discussion Paper Cloud Computing in the Victorian Public Sector Discussion Paper Document Details Document Details Security Classification UNCLASSIFIED Version
CEN WORKSHOP CWA 16458 May 2012 AGREEMENT ICS 35.020 English version European ICT Professional Profiles This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested
Institute at the University of Bremen Reorganisation of government back-offices for better electronic public services European good practices (back-office reorganisation) Final report to the European Commission
Digitizing public sector services Norwegian egovernment Program Table of contents Preface 3 Summary 4 1 Digitization for better services and more efficient use of resources 6 2 Strategic choices for the
Audit Manual PART TWO SYSTEM BASED AUDIT Table of content 1. Introduction...3 2. Systems based audit...4 2.1. Preparing for & planning the audit assignment...5 2.2. Ascertaining and recording the system...7