Curation Report KEMPENSCH TAALEIGEN



Similar documents
A Unified Structure for Dutch Dialect Dictionary Data

How To Develop A Project For The Netherlands National Library

The Syntactic Atlas of the Dutch Dialects

Component MetaData Infrastructure

SAND: Relation between the Database and Printed Maps

How To Create A Clarin Metadata Infrastructure

LEXUS: a web based lexicon tool

Curation report - Dutch Bilingual Database

Sustainable Solutions for Endangered Languages Data: The Language Archive


Your boldest wishes concerning online corpora: OpenSoNaR and you

CLARIN-NL Third Call: Closed Call

Introduction to the Digital Literacy Instructor

Dialect Corpora Taken Further: The DynaSAND corpus and its application in newer tools

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

Applying quantitative methods to dialect Dutch verb clusters

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Search and Information Retrieval

DC, MODS and CERIF-XML

System Requirements for Archiving Electronic Records PROS 99/007 Specification 1. Public Record Office Victoria

OpenDocument Format. The future of ODF. Jos van den Oever Logius / KOOP Ministery for the Interior The Netherlands

Next Generation Sequencing; Technologies, applications and data analysis

FoLiA: Format for Linguistic Annotation

Next Generation Sequencing; Technologies, applications and data analysis

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF

Project notes of CLARIN project DiscAn: Towards a Discourse Annotation system for Dutch language corpora

Using Dataverse Virtual Archive Technology for Research Data Management. Jonathan Crabtree Thu-Mai Christian Amanda Gooch

Mobility Tool+ Guide for Beneficiaries of the Erasmus+ programme

Mobility Tool+ Guide for Beneficiaries of the Erasmus+ programme

Next Generation Sequencing; Technologies, applications and data analysis

Assessment of low-frequency noise due to windturbines in relation to low-frequency background

4.16 National CARARE Workshop in the Netherlands

Nevada NSF EPSCoR Track 1 Data Management Plan

Input by KPS on Call for advice by EIOPA For the review of IORP II

Breeze ediscovery Suite 3.5.0

The Chat Box Revelation On the chat language of Flemish adolescents and young adults

De Nederlandse topsporter en het antidopingbeleid

Executive summary. Executive summary 8

CLARIN-NL CALL 3 Proposal Verrijkt Koninkrijk

Natural Language to Relational Query by Using Parsing Compiler

NWO-DANS Data Contracts

Tulsa Community College (Associate of Arts -Psychology)

Tibiscus University, Timişoara

Turning Emergency Plans into Executable

Baan IV Tools. Baan IV Database Administration (DBA) on Windows NT

RHS VISUAL ICD 9 to ICD 10 Conversion

How to translate VisualPlace

Police Academy of the Netherlands

Next Generation Sequencing; Technologies, applications and data analysis

WHERE IS THE DUTCH OER LIBRARIAN?

M.A.Lips. Copyright 2014, M.A.Lips, Amsterdam, the Netherlands

Zeynep Azar. English Teacher, Açı Private Primary School, Istanbul, Turkey Azar, E.Z.

Buurten van gemeente Groningen

National Background: Netherlands 2015

The University of Amsterdam s Question Answering System at QA@CLEF 2007

Microinvest Warehouse Pro Light Restaurant is designed to work in tandem with Microinvest Warehouse Pro which provides all back office functions.

Acquiring grammatical gender in northern and southern Dutch. Jan Klom, Gunther De Vogelaer

The Victim and Compensation: the Dutch approach Alex Sas Victim Support NL

SQL*Plus s Forgotten History

There are various ways to find data using the Hennepin County GIS Open Data site:

Technical Report. The KNIME Text Processing Feature:

REDCap General Security Overview

MTEC Legislation Programme : 23 November - 4 December, The Hague

Province of North Brabant: Enhancing Efficiency by Integrating Geographic Information into SAP ERP

ESISS Security Scanner

Electronic Remittance Advice (ERA) Processor

How To Plan An Ambulance In The Netherlands

TUTORIAL: Reporting Gold-Vision 6

Transcription:

Curation Report KEMPENSCH TAALEIGEN BERGEIJKS DIALECTWOORDENBOEK CLARIN NL Data Curation Service Version 1, 8 oktober 2013 Henk van den Heuvel CLST, Radboud University Nijmegen

1. Introduction There are various small local dialect dictionaries for the province of Noord Brabant in the Netherlands. One of these dictionaries is: Panken, P.N. (1850) Kempensch taaleigen. Bergeijk: Johan Biemans [red. 2010]. This dictionary contains dialect entries for the village Bergeijk and surroundings in Noord Brabant and is added to the curated version as a PDF file. In this report we report upon the curation of this dictionary. This dictionary was offered for curation by prof dr Jos Swanenberg. The entries were manually enriched with a Dutch keyword, before they were provide to the DCS. Each record contains the following information: Field name dialectwoord trefwoord English dialectword Dutch keyword begrip grammaticale informatie voorbeeldzinnen Sense Grammatical information Example sentences Further information is known and added to the curated version: Kloeke = K279p Area = Noord Brabant / Dutch Brabant Place = Bergeijk Sourcebook = Panken, P.N. (1850) Kempensch taaleigen. Bergeijk: Johan Biemans [red. 2010] 2. Data The dictionary was provided in as text dump of SQL. The fields mentioned above were split and

converted into a CSV file (tab separated) in UTF8 encoding. 3. Metadata Parts of the Limburgian and Brabant dialect dictionaries (WLD and WBD) were digitized in the 1 CLARIN NL COAVA project. In the COAVA project a CMDI profile was developed by Folkert de Vriend for WBD and WLD. This profile was extended by the DCS to a more general profile for Dutch Dialect Dictionaries and published in the http://catalog.clarin.eu/ds/componentregistry/# as WND (Woordenbank van de Nederlandse Dialecten). This profile was used to generate the CMDI metadatafile for this dictionary. 4. Restructuring the database The TAB separated files were used as starting point for converting the data into LMF format. 5. Converting formats 2 The TAB separated files were converted to an LMF format. The LMF model for dialect dictionary data was developed by the DCS in close cooperation with Menzo Windhouwer. During this process dialectologists were consulted as to the proper inclusion and naming of lexical features in the model. The model consists of three main classes for a Lexical Entry : Sense, Form,. is a new class in the model. Keyword (trefwoord in Dutch) is the only mandatory feature for a lexical entry in the model. Next, the data of the dictionary were fitted into the model as shown in the table below. Kempensch Taaleigen trefwoord LMF Form Keyword= 1 Refer to http://www.clarin.nl/page/about/projects/162#coava 2 LMF: Lexical Markup Framework: http://www.lexicalmarkupframework.org/

dialectwoord Begrip grammaticale informatie voorbeeldzinnen boek Form Representation Dialectform= Sense Meaning= Form Representation GrammaticalInfo= Context Example= Definition sourcebook=panken, P.N. (1850) Kempensch taaleigen. Bergeijk: Johan Biemans [red. 2010] bron place=bergeijk area=noord Brabant / Dutch Brabant kloeke kloeke=k279p A corresponding LMF file was created including the LMF categories in the table above. 6. Documentation Provided in this Curation Report. Relevant information about the dictionary and its design can be found in the book:panken, P.N. (1850) Kempensch taaleigen. Bergeijk: Johan Biemans [red. 2010] (in Dutch) 7. Persistent identifiers

Persistent identifiers were attributed by the CLARIN Data Centre (Meertens Institute). 8. Transfer data to CLARIN data centre The curated dictionary consisting of the lmf file, the dictionary/book as PDF file, this curation report and a cmdi metadata file are stored at the Meertens Institute as CLARIN data centre. Metadata harvesting and accessibility are taken care of of by Meertens.