Your boldest wishes concerning online corpora: OpenSoNaR and you
|
|
|
- Hester Ball
- 10 years ago
- Views:
Transcription
1 1 Your boldest wishes concerning online corpora: OpenSoNaR and you Martin Reynaert TiCC, Tilburg University and CLST, Radboud Universiteit Nijmegen TiCC Colloquium, Tilburg University. October 16th, 2013
2 Outline 1 Section 1: SoNaR Overview 2 Section 2: SoNaR-500 Content Highlights 3 Section 3: Towards OpenSoNaR 4 Section 4: A peek at other online corpora 5 Section 5: Your own role in OpenSoNaR
3 3 Section 1: SoNaR Overview SoNaR: Patron & Consortium Patron: Dutch Language Union (Nederlandse Taalunie) in the STEVIN programme Consortium: Netherlands: Radboud University Nijmegen, Tilburg University, Twente University & Utrecht University Belgium (Flanders): University College Ghent, KU Leuven
4 Section 1: SoNaR Overview 4 STEVIN Nederlandstalig Referentiecorpus (SoNaR) project Directed at the actual construction of a 500 MW reference corpus of contemporary standard written Dutch as encountered in texts originating from the Dutch speaking language area in Flanders and the Netherlands as well as translations published in and targeted at this area Where possible, (re)use of formats, tools, protocols, and annotation schemes previously developed (D-Coi, but also COREA, DPC, Lassy)
5 Section 1: SoNaR Overview MW Reference corpus (SoNaR-500) includes native speaker language and the language of (professional) translators approx. two-thirds of the data originate from the Netherlands and one-third from Flanders only texts included from the year 1954 onwards, in practice: only born-digital texts comprises full texts rather than text samples includes texts from a wide range of 40 text types incl. texts from the new media for (almost) all data IPR has been arranged
6 6 Section 1: SoNaR Overview SoNaR-500: Automatic linguistic annotations automatic sentence splitting and tokenization: ILKTOK automatic POS tagging and lemmatisation and morphological analysis using FROG for all data tagset is essentially the set used in the CGN project automatic NE labeling by means of NERD Named Entity Recognition for Dutch: NE classifier developed on the basis of SoNaR-1
7 Section 2: SoNaR-500 Content Highlights 7 Contents Besides the texts: all available metadata has been recorded Over 30 text types old and new media all contemporary, written Dutch. Essentially all born-digital. everything from books to tweets... Only discuss a few of the highlights here...
8 8 Section 2: SoNaR-500 Content Highlights Old Media: Highlights: Books Largest donation by Dutch publisher Maarten Muntinga 145 titles: travel, thrillers, chick lit, life style, etc. almost 14 MW, almost no IPR-restrictions great news for KNAW project "The riddle of literary quality" 4 versions of the Bible 3 protestant, 1 catholic N-gram frequency lists are a shambles!!
9 9 Section 2: SoNaR-500 Content Highlights Old Media: Highlights: Magazines/Periodicals Largest donation by Flemish publisher Roularta all CD-roms published between 1991 and 2004 weekly delivery of new issues of their several titles no commercial use no language restriction: huge comparable corpus Dutch-French a possibility! Largest Dutch donation: De Groene Amsterdammer 12 years editions
10 10 Section 2: SoNaR-500 Content Highlights Old Media: Highlights: Newspapers Largest Flemish donation: Mediargus 1.3 BW available to the community, no IPR-agreement IPR-restriction 1: Take only what is required have taken Edition 2006 of 4 main Flemish newspapers. IPR-restriction 2: no commercial use Largest Dutch donation: PCM / De Persgroep Selection from Twente News Corpus 60MW
11 11 Section 2: SoNaR-500 Content Highlights Old Media: Highlights: Reports NIOD (Institute for War, Holocaust and Genocide Studies) Srebrenica report over 7,000 pages of text probably best known report in Dutch ever potential for building a parallel corpus: English version also online
12 12 Section 2: SoNaR-500 Content Highlights Old/New Media Highlights: Subtitles Largest Flemish donation: VRT wealth of TV-series, a.o. De Kampioenen, MW even more still in surplus Open Subtitle Corpus (Tiedemann) balanced set of film subtitles, produced by volunteers classified as both Dutch and Flemish huge surplus of over 384MW available...
13 13 Section 2: SoNaR-500 Content Highlights New Media: Highlights: Discussion Lists Politics.be largest internet forum in Flanders. IPR-settled. site delivered over 480MW, over 2 years ago... Incorporated only 25MW... TMF: Flemish youth forum highly dialectical... about 70% of posts classified by language recognizer TextCat as: " I don t know; Perhaps this is a language I haven t seen before? " 20MW were incorporated...
14 14 Section 2: SoNaR-500 Content Highlights New Media: Highlights: Chats Chats chats NL with IPR-agreement and metadata: 0.5MW elicited, mainly collected in schools and within a research team chats NL no IPR-agreement and without metadata: 8 M natural chat VL with IPR-agreement, without metadata: 9 M natural
15 15 Section 2: SoNaR-500 Content Highlights New Media: Highlights: Tweets & SMS Twitter: did not yet exist in 2005, no target in SoNaR MW incorporated token count prior to tokenization... SMS collected in two parallel campaigns in NL and VL a prize was offered to donators a well balanced (in terms of 1/3 VL vs. 2/3 NL) set of over 50K SMS were collected these represent about 620K SMS-words
16 16 Section 3: Towards OpenSoNaR Concluding remarks on the SoNaR corpus SoNaR project represents an excellent return on investment: SoNaR-500 definitely allows to sound most dimensions of contemporary, written Dutch SoNaR-1 has yielded world-class, large gold standards for semantic annotation SoNaR fills major gaps in the Dutch language resources infrastructure BUT: Its 2.1M text files accompanied by 2.1M metadata files present a major practical obstacle to most anyone Which is why we are building OpenSoNaR: a system for Online Personal Exploration and Navigation of SoNaR"
17 Section 3: Towards OpenSoNaR 17 Opening remarks on the OpenSoNaR project We want SoNaR online in as open a fashion as possible: usable for schoolkids and researchers alike You, researchers, are hereby invited to tell us what you expect from this system Please provide us with your wish lists, user cases The Dutch Language Union and the IPR agreements with text providers impose some restrictions: these we will honour The whole corpus is essentially free for non-commercial research purposes
18 Section 3: Towards OpenSoNaR 18 Intentions of OpenSoNaR Address this corpus exploration and exploitation problem Develop and make available an open corpus exploration and exploitation environment Tailor the front-ends to the desiderata of 4 CLARIN-NL priority groups of users
19 Section 3: Towards OpenSoNaR 19 Means employed by OpenSoNaR Intended originally to build on the well-known Corpus Workbench back-end Turned out INL had been developing a new alternative Java system BlackLab, based on Apache Lucene, open source via GitHub OpenSoNaR will build Whitelab front-ends according to the user groups specifications
20 Section 3: Towards OpenSoNaR 20 OpenSoNaR User Groups in more detail CLARIN-NL posited 6 priority user groups. OpenSoNaR addresses priority groups 2 to 5: Literary Sciences: Huygens ING, Karina van Dalen-Oskam, assisted by colleagues at Utrecht University Cultural Sciences: Meertens Institute, Nicoline van der Sijs, assisted by Ewoud Sanders Linguistics: TiCC: Jan Renkema & Ad Backus & colleagues Communication and Media Studies: TiCC: Fons Maes & Emiel Krahmer & colleagues
21 Section 4: A peek at other online corpora 21 Nederlab URL: NWO Groot project Goes far beyond OpenSoNaR 4M euro budget vs. 150K, runs 5 years vs. 1 Diachronic vs. synchronic: all Dutch corpora, since about the year 800 Laboratory for research Exploration of the corpora: through time, space, genres: metadata Exploitation of the corpora: possible to make your own subcorpus, put that in your private or privately shared work space Analysis of the corpora: tools for all kinds of text/topic mining, comparison making, measuring particular features,... Let us take a look at the Clariah Demonstrator...
22 Section 4: A peek at other online corpora 22 Språkbanken (the Swedish Language Bank) corpora to date, 1.4 Billion word tokens Korp (E.: crow): the corpora Karp (E: carp): the dictionaries From bird s eye view of the text to below surface information Mediated by wonderful little icons Below surface information from dictionaries, Wordnet, automatic parsing etc. Based on older technology: Corpus Work Bench Uses Corpus Query Language More elaborate than OpenSoNaR Less so than Nederlab
23 Section 4: A peek at other online corpora 23 Brieven als Buit: Why does he not write?" URL: About 1, th century Dutch letters Subset of the Gekaapte Brieven : mail taken as loot by privateers and confiscated by the High Court of Admiralty during the wars fought between The Netherlands and England Digitized and annotated by Dutch Institute of Lexicology (INL) INL corpus back-end system BlackLab also the basis for OpenSoNaR OpenSoNaR will at first have the functionalities of this site Let s get acquainted!
24 24 Section 5: Your own role in OpenSoNaR OpenSoNaR vs. these three online resources To summarize: Språkbanken Definitely a great source for inspiration and emulation Brieven als Buit Great start that already will provide good basic functionality to further build on Nederlab If the OpenSoNaR project provides, Nederlab will no doubt be enhanced by its accomplishments
25 Section 5: Your own role in OpenSoNaR 25 OpenSoNaR: online soon We are setting up a server for the OpenSoNaR website URL: Will host the rudimentary OpenSoNaR soon, corpus is being indexed at INL We will keep you informed, of course Practical guidelines for submitting wish-lists, case descriptions and for getting feedback via our issue tracker will be made available asap.
26 Section 5: Your own role in OpenSoNaR 26
27 Section 5: Your own role in OpenSoNaR 27
28 28 Thanks!! Section 5: Your own role in OpenSoNaR Thanks for your attention! Papers about SoNaR are available at: Your boldest wishes concerning online corpora: OpenSoNaR and you Martin Reynaert TiCC, Tilburg University and CLST, Radboud Universiteit Nijmegen
Dutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken [email protected] LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
FoLiA: Format for Linguistic Annotation
Maarten van Gompel Radboud University Nijmegen 20-01-2012 Introduction Introduction What is FoLiA? Generalised XML-based format for a wide variety of linguistic annotation Characteristics Generalised paradigm
How To Create A Clarin Metadata Infrastructure
Creating & Testing CLARIN Metadata Components Folkert de Vriend (1), Daan Broeder (2), Griet Depoorter (3), Laura van Eerten (3), Dieter van Uytvanck (2) 1) Meertens Institute Joan Muyskenweg 25, Amsterdam,
The University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
The Syntactic Atlas of the Dutch Dialects
The Syntactic Atlas of the Dutch Dialects A corpus of elicited speech as an on-line Dynamic Atlas Sjef Barbiers & Jan Pieter Kunst Meertens Institute (KNAW) 1 Coordination Hans Bennis (Meertens Institute)
CLARIN-NL Third Call: Closed Call
CLARIN-NL Third Call: Closed Call CLARIN-NL launches in its third call a Closed Call for project proposals. This called is only open for researchers who have been explicitly invited to submit a project
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
CLAM: Quickly deploy NLP command-line tools on the web
CLAM: Quickly deploy NLP command-line tools on the web Maarten van Gompel Centre for Language Studies (CLS) Radboud University Nijmegen [email protected] Martin Reynaert CLS, Radboud University Nijmegen
Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
Dutch-Flemish Research Programme for Dutch Language and Speech Technology. stevin programme. project results
Dutch-Flemish Research Programme for Dutch Language and Speech Technology stevin programme project results Contents Design and editing: Erica Renckens (www.ericarenckens.nl) Design front cover: www.nieuw-eken.nl
PDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/60933
From D-Coi to SoNaR: A reference corpus for Dutch
From D-Coi to SoNaR: A reference corpus for Dutch N. Oostdijk, M. Reynaert, P. Monachesi, G. van Noord, R. Ordelman, I. Schuurman, V. Vandeghinste University of Nijmegen, Tilburg University, Utrecht University,
The Knowledge Sharing Infrastructure KSI. Steven Krauwer
The Knowledge Sharing Infrastructure KSI Steven Krauwer 1 Why a KSI? Building or using a complex installation requires specialized skills and expertise. CLARIN is no exception. CLARIN is populated with
aloe-project.de White Paper ALOE White Paper - Martin Memmel
aloe-project.de White Paper Contact: Dr. Martin Memmel German Research Center for Artificial Intelligence DFKI GmbH Trippstadter Straße 122 67663 Kaiserslautern fon fax mail web +49-631-20575-1210 +49-631-20575-1030
Automatic measurement of Social Media Use
Automatic measurement of Social Media Use Iwan Timmer University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT Today Social Media is not only used for personal
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
HOME IN ON HOLLAND. Direct Dutch Institute (in-) company courses. Laan van Nieuw Oost-Indië 275 2593 BS Den Haag The Netherlands 0031(0)70 365 46 77
HOME IN ON HOLLAND Direct Dutch Institute (in-) company courses Laan van Nieuw Oost-Indië 275 2593 BS Den Haag The Netherlands 0031(0)70 365 46 77 1 BESTE COLLEGA S Ik wil graag Nederlands leren. Blijf
Theme 6: State Building Participants. Coordinator: Ido de Haan (Utrecht University) Participants In alphabetical order
Theme 6: State Building Participants Coordinator: Ido de Haan (Utrecht University) Participants In alphabetical order 1. Alberto Feenstra 2. Klaas van Gelder 3. Jens van de Maele 4. Karen van Nieuwenhuyze
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web
Real-Time Identification of MWE Candidates in Databases from the BNC and the Web Identifying and Researching Multi-Word Units British Association for Applied Linguistics Corpus Linguistics SIG Oxford Text
Investment Subsidy NWO Large 2011-2012
Application form Investment Subsidy NWO Large The completed application form including attachments should Please read the brochure Investment Subsidy NWO Large 2011- be submitted electronically via the
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
Introduction to Text Mining. Module 2: Information Extraction in GATE
Introduction to Text Mining Module 2: Information Extraction in GATE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence
Sign language transcription conventions for the ECHO Project
Sign language transcription conventions for the ECHO Project Annika Nonhebel, Onno Crasborn & Els van der Kooij University of Nijmegen Version 9, 20 Jan. 2004 URL: http://www.let.kun.nl/sign-lang/echo/docs/transcr_conv.pdf
Central and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
LAMUS & LAT Archiving software
LAMUS & LAT Archiving software Daan Broeder Max-Planck Institute for Psycholinguistics The Language Archive Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands The Language Archive - 2011
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Building a protocol validator for Business to Business Communications. Abstract
Building a protocol validator for Business to Business Communications Rudi van Drunen, Competa IT B.V. ([email protected]) Rix Groenboom, Parasoft Netherlands ([email protected]) Abstract
1995 (March) Internship at VRT Radio 1 (Belgian national radio) for 'Eenhoorn', a now suspended daily program with cultural news, reviews and agenda
Sandra Fauconnier 16/04/1973 - Geraardsbergen, Belgium Nationality: Belgian Stadhoudersplein 16b 3039 EN Rotterdam The Netherlands M +31 6 46284456 [email protected] http://www.spinster.be/ education 1993
Task 1 Melissa and the computer suite.
Do you sometimes shy away from using computers with your learners? Are you not sure what to do with them? This lesson gives an introductory overview of materials and activities that are central to using
Scalable and sustainable OCR & document image analysis in the cloud
Scalable and sustainable OCR & document image analysis in the cloud New Trends in Humanities Computing Lotte Wilms Koninklijke Bibliotheek, IMPACT Project René van der Ark Koninklijke Bibliotheek, Research
Example-based Translation without Parallel Corpora: First experiments on a prototype
-based Translation without Parallel Corpora: First experiments on a prototype Vincent Vandeghinste, Peter Dirix and Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven Maria
Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation
Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed
Master of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
Your Fundraising Campaign: A How-To Guide for Maximizing Success
Your Fundraising Campaign: A How-To Guide for Maximizing Success Hello Fundraisers! We created this guide to help you maximize your fundraising success on Volunteer Forever. First of all, congratulations
Curation Report KEMPENSCH TAALEIGEN
Curation Report KEMPENSCH TAALEIGEN BERGEIJKS DIALECTWOORDENBOEK CLARIN NL Data Curation Service Version 1, 8 oktober 2013 Henk van den Heuvel CLST, Radboud University Nijmegen 1. Introduction There are
CLARIN: Common Language Resources and Technology Infrastructure
CLARIN: Common Language Resources and Technology Infrastructure Tamás Váradi, Peter Wittenburg, Steven Krauwer, Martin Wynne, Kimmo Koskenniemi Hungarian Academy of Sciences (Budapest), MPI for Psycholinguistics
Master in Digital Humanities
Master in Digital Humanities Faculty of Arts Faculty of Psychology and Educational Sciences Faculty of Social Sciences Faculty of Sciences Department of Computer Science KU Leuven. Inspiring the outstanding.
Schema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
INPUTLOG 6.0 a research tool for logging and analyzing writing process data. Linguistic analysis. Linguistic analysis
INPUTLOG 6.0 a research tool for logging and analyzing writing process data Linguistic analysis From character level analyses to word level analyses Linguistic analysis 2 Linguistic Analyses The concept
Zeynep Azar. English Teacher, Açı Private Primary School, Istanbul, Turkey Azar, E.Z.
Zeynep Azar Date/Place of birth : 13 November 1988, Bursa, Turkey Nationality : Turkish Address : Bisschop Zwijsenstraat 103-01 Zipcode, Residence : 5021KB, Tilburg, Netherlands Phone number : +31 (0)
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
LEXUS: a web based lexicon tool
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Content Max Planck Institute Archive of linguistic resources Tool support (archiving
Your guide to using new media
Your guide to using new media A comprehensive guide for the charity and voluntary sector with tips on how to make the most of new, low cost communication tools such as social media and email marketing.
Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data
Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights
2. Cyber security research in the Netherlands
2. Cyber security research in the Netherlands Jan Piet Barthel MSc Netherlands Organization for Scientific Research A strong motivation to enforce CS research: Absence or lack of cyber security is listed
A chart generator for the Dutch Alpino grammar
June 10, 2009 Introduction Parsing: determining the grammatical structure of a sentence. Semantics: a parser can build a representation of meaning (semantics) as a side-effect of parsing a sentence. Generation:
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
ODIS 2 A database on intermediary structures in Europe, 19th & 20th centuries
ODIS 2 A database on intermediary structures in Europe, 19th & 20th centuries Joris Colla and Stephan Pauls Launch event ODIS 2 Brussels, Flemish Parliament, 29th November 2013 Contents 1. From ODIS 1
REPORT. Public seminar, 10 November 2010, 1.30-5.00 p.m. Concordia Theatre, The Hague
REPORT Complexity-oriented oriented Planning, Monitoring and Evaluation (PME) From alternative to mainstream? Public seminar, 10 November 2010, 1.30-5.00 p.m. Concordia Theatre, The Hague 1. What was the
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach
Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are
REDCap General Security Overview
REDCap General Security Overview Introduction REDCap is a web application for building and managing online surveys and databases, and thus proper security practices must instituted on the network and server(s)
The challenges of becoming a Trusted Digital Repository
The challenges of becoming a Trusted Digital Repository Annemieke de Jong is Preservation Officer at the Netherlands Institute for Sound and Vision (NISV) in Hilversum. She is responsible for setting out
Towards a Visually Enhanced Medical Search Engine
Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Online Media Kit 2014-FCC_OnlineMediaKit 12/4/2014 8:56 AM Page 1 nline Odvertising A
Online Advertising The Network Forum Communications Company online advertising network includes 46 websites across Minnesota, Wisconsin, North Dakota, and South Dakota. Our entire network reaches 2+ Million
Context Aware Predictive Analytics: Motivation, Potential, Challenges
Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
MODULE GUIDE MASTER S DEGREE PROGRAM NORTH AMERICAN STUDIES: CULTURE AND LITERATURE. for the. at the Friedrich-Alexander-University Erlangen-Nuremberg
MODULE GUIDE for the MASTER S DEGREE PROGRAM NORTH AMERICAN STUDIES: CULTURE AND LITERATURE at the Friedrich-Alexander-University Erlangen-Nuremberg PROGRAM ADVISORS: PROF. DR. ANTJE KLEY PROF. DR. HEIKE
Big Data and Scripting. (lecture, computer science, bachelor/master/phd)
Big Data and Scripting (lecture, computer science, bachelor/master/phd) Big Data and Scripting - abstract/organization abstract introduction to Big Data and involved techniques lecture (2+2) practical
Evaluating OO-CASE tools: OO research meets practice
Evaluating OO-CASE tools: OO research meets practice Danny Greefhorst, Matthijs Maat, Rob Maijers {greefhorst, maat, maijers}@serc.nl Software Engineering Research Centre - SERC PO Box 424 3500 AK Utrecht
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
IPv6 Preparation and Deployment in Datacenter Infrastructure A Practical Approach
Paper IPv6 Preparation and Deployment in Datacenter Infrastructure A Practical Approach Marco van der Pal Generic Services Network Infrastructure Services, Capgemini Netherlands B.V., Utrecht, The Netherlands
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
