Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
|
|
|
- Bonnie Miller
- 10 years ago
- Views:
Transcription
1 Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv
2 1. The Metadata Format CMDI
3 Metadata? Metadata Format? and more
4 Metadata? Metadata Format? and more
5 Metadata? Metadata Format? and more CMDI (Component Metadata Infrastructure)
6 CMDI? What's that? Component Metadata Infrastructure Metadata Components (e.g. author, title, license, ) combined to Metadata Profiles (e.g. DTA Basisformat teiheader) Create new components/profiles or re-use those which are already there One basic CMDI structure all resources have in common ISOcat Data Categories for definition of the semantics of components
7 Why CMDI? CMDI is not a format per se but rather a framework Hence: I don't really have to decide on a format I define the semantics of my metadata categories myself Plus in CMDI you can describe any resource you like: collections/corpora, single texts historical sources, recent sources sound (spoken, music), film, text, multimedia lexical resources (lexica & dictionaries, treebanks, ) tools, services, applications These descriptions can then be represented as a whole Hence: Get all there is in CLARIN through one portal
8 CMDI Basic Structure (Example DTA) <?xml version="1.0" encoding="utf-8"?> <CMD Here: DTA-CMDI profile xsi:schemalocation=" xmlns:xsi=" xmlns=" CMDVersion="1.1"> <Header> [...] </Header> <Resources> [...] </Resources> <Components> [...] </Components> </CMD> Namespace information Schema specification Version information (N.b. new version CMDI 1.2 coming up)
9 CMDI Basic Structure (Example DTA) <?xml version="1.0" encoding="utf-8"?> <CMD> <Header> <MdCreator>Deutsches Textarchiv</MdCreator> <MdCreationDate> </MdCreationDate> <MdSelfLink> </MdSelfLink> <MdProfile> clarin.eu:cr1:p_ </MdProfile> <MdCollectionDisplayName> Deutsches Textarchiv ( ) </MdCollectionDisplayName> </Header> <Resources>[ ]</Resources> <Components>[ ]</Components> </CMD> Header for Meta-Metadata
10 CMDI Basic Structure (Example DTA) <?xml version="1.0" encoding="utf-8"?> <CMD> <Header>[ ]</Header> <Resources> <ResourceProxyList> <ResourceProxy id="dta-altmann_elementarorganismen_1890.landing_page"> <ResourceType>LandingPage</ResourceType> <ResourceRef> </ResourceRef> </ResourceProxy> </ResourceProxyList> <JournalFileProxyList>[ ]</JournalFileProxyList> <ResourceRelationList>[ ]</ResourceRelationList> <IsPartOfList> <ispartof>[ ]</ispartof> </IsPartOfList> </Resources> <Components>[ ]</Components> </CMD> Resources described and resources somehow related to them
11 CMDI Components (Example DTA) <?xml version="1.0" encoding="utf-8"?> <CMD> <Header>[ ]</Header> <Resources>[ ]</Resources> <Components> <teiheader> <filedesc> <titlestmt> <title type="main"> Die Elementarorganismen und ihre Beziehungen zu den Zellen </title> <author>[...]</author> [...] <publicationstmt>[including availability]</publicationstmt> <sourcedesc> [including depository of the physical source] </sourcedesc> </filedesc> <encodingdesc>[...]</encodingdesc> <profiledesc>[including genre]</profiledesc> </teiheader> </Components> </CMD> Components: Actual metadata of the resource described
12 The world of Components: Components
13 The world of Components: ISOcat DC-2978 Data Element Name: Person PID: Definition: the name of a person
14 The world of Components: Profiles
15 The world of Components Think of what you need Put together components Create your own CMDI profile Or: re-use something which is already there Questions about CMDI? Helpdesk (Timm Lehmberg's talk) CLARIN Centers CLARIN User Guide
16 CMDI Components (Ex. WebLicht Webservices - CAB) <?xml version="1.0" encoding="utf-8"?> <CMD>[ ] <Header>[ ]</Header> <Resources>[ ]</Resources> <Components> <WebLichtWebService> <Service> <Name>CAB orthographic canonicalizer</name> <Description> orthographic normalization for historical German </Description> <TypeOfWebservice>RESTfull</TypeOfWebservice> <url> <LifeCycleStatus>production</LifeCycleStatus> <PublicationDate> T07:34:20Z</PublicationDate> <LastUpdate> T07:34:20Z</LastUpdate> <ServiceDescriptionLocation ref="s056"/> <Contact> < >[email protected]</ > </Contact> <Creation>[Information about creation and creators]</creation> Components: Actual metadata of the resource described
17 CMDI Components (Ex. WebLicht Webservices - CAB) <Operations><Operation> <Name>Default</Name> <Input><ParameterGroup> <Name>Input Parameters</Name> <Parameters><Parameter> <Name>tokens</Name> Components: Actual metadata of the resource described <AllowManualSelectionFallback>false</AllowManualSelectionFallback> </Parameter> <Parameter> <Name>sentences</Name> <AllowManualSelectionFallback>false</AllowManualSelectionFallback> </Parameter>[ ]</Parameters>[ ]</ParameterGroup> </Input> <Output><ParameterGroup> <Name>Output Parameters</Name> <ReplacesInput>false</ReplacesInput> <Parameters><Parameter> <Name>orthography</Name> </Parameter></Parameters> </ParameterGroup></Output> </Operation></Operations></Service></WebLichtWebService></Components></CMD>
18 2. The Text Corpus Format TCF
19 TCF: Text Corpus Format What is it? XML stand-off format for linguistic annotations Developed for WebLicht in the context of CLARIN-D Compatibility LAF (Linguistic Annotation Format / ISO 24612:2012) GrAF (Graph Annotation Format/ Ide & Suterman, 2007) What is it good for? Facilitates annotation-tool interoperability & orchestration Lingua franca for web-service execution ( tool chains ) Explicit specification for concrete annotation tasks Incremental processing annotation layers e.g. tokens, sentences, PoS-tags, lemmata, parse trees,
20 TCF + WebLicht: Example Chain All tools use the same I/O format (TCF) Each tool adds one or more annotation layer(s) Existing layers are passed through unchanged information from input document is preserved Some TCF layers: text tokens sentences POStags lemmas parsing depparsing morphology namedentities references matches orthography... and more!
21 TCF Example (1): Input Input: simple XML text <text> EJn zamer Elephant gilt ohngefa hr zweyhundert Thaler. Ceterum censeo Carthaginem esse delendam. </text> Converter: XML TCF (text layer) XML serialization Desgined for DTABf
22 TCF Example (2): Text Layer Output: TCF superstructure and text layer <D-Spin xmlns=... version="0.4"> <TextCorpus xmlns=... lang="de"> <text> EJn zamer Elephant gilt ohngefa hr zweyhundert Thaler. Ceterum censeo Carthaginem esse delendam. </text> </TextCorpus> </D-Spin> TCF version document language raw (serialized) document text
23 TCF Example (3): Tokenization <D-Spin... version="0.4"> <TextCorpus... lang="de"> <text>...</text> <tokens> <token ID="w1">EJn</token> <token ID="w2">zamer</token> <token ID="w3">Elephant</token>... </tokens> <sentences> <sentence ID="s1" tokenids="w1 w2 w3 w4 w5 w6 w7 w8"/> <sentence ID="s2" tokenids="w9 wa wb wc wd we"/> </sentences> </TextCorpus> </D-Spin> tokenization tokens- and sentences-layers unique IDs for inter-layer cross-references
24 TCF Example (4): (modern) Orthography <D-Spin... version="0.4"> <TextCorpus... lang="de"> <tokens> <token ID="w1">EJn</token> <token ID="w2">zamer</token> <token ID="w3">Elephant</token>... </tokens>... <orthography> <correction tokenids="w1" operation="replace">ein</correction> <correction tokenids="w2"...="replace">zahmer</correction> <correction tokenids="w3...="replace">elefant</correction>... </orthography> </TextCorpus> </D-Spin> Orthographic normalization orthography-layer
25 TCF Example (5): Part-of-Speech Tags <D-Spin... version="0.4"> <TextCorpus... lang="de"> <tokens> <token ID="w1">EJn</token> <token ID="w2">zamer</token> <token ID="w3">Elephant</token>... </tokens>... <POStags tagset="stts"> <tag tokenids="w1">art</tag> <tag tokenids="w2">adja</tag> <tag tokenids="w3">nn</tag>... </POStags> </TextCorpus> </D-Spin> PoS-tagging POStags-layer (+ tagset attribute)
26 TCF Example (6): (modern) Lemmata <D-Spin... version="0.4"> <TextCorpus... lang="de"> <tokens> <token ID="w1">EJn</token> <token ID="w2">zamer</token> <token ID="w3">Elephant</token>... </tokens>... <lemmas> <lemma tokenids="w1">eine</lemma> <lemma tokenids="w2">zahm</lemma> <lemma tokenids="w3">elefant</lemma>... </lemmas> </TextCorpus> </D-Spin> Lemmatization lemmas-layer
27 WebLicht Further Processing of TCF data within CLARIN's WebLicht cf. Thorsten Trippel's talk
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
The CroCo Translation Archive
LINGUISTIC PROPERTIES OF TRANSLATIONS A CORPUS-BASED INVESTIGATION FOR THE LANGUAGE PAIR ENGLISH-GERMAN The CroCo Translation Archive Language Archives: Standards, Creation and Access Mihaela Vela & Silvia
FoLiA: Format for Linguistic Annotation
Maarten van Gompel Radboud University Nijmegen 20-01-2012 Introduction Introduction What is FoLiA? Generalised XML-based format for a wide variety of linguistic annotation Characteristics Generalised paradigm
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Schema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
CLARIN-NL Third Call: Closed Call
CLARIN-NL Third Call: Closed Call CLARIN-NL launches in its third call a Closed Call for project proposals. This called is only open for researchers who have been explicitly invited to submit a project
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing
A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
The Knowledge Sharing Infrastructure KSI. Steven Krauwer
The Knowledge Sharing Infrastructure KSI Steven Krauwer 1 Why a KSI? Building or using a complex installation requires specialized skills and expertise. CLARIN is no exception. CLARIN is populated with
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Poio API - An annotation framework to bridge Language Documentation and Natural Language Processing
Poio API - An annotation framework to bridge Language Documentation and Natural Language Processing Peter Bouda, Vera Ferreira, António Lopes Centro Interdisciplinar de Documentação Linguística e Social
NoSta-D: A Corpus of German Non-standard Varieties
NoSta-D: A Corpus of German Non-standard Varieties Stefanie Dipper 1, Anke Lüdeling 2, Marc Reznicek 2 Ruhr-Universität Bochum 1 Humboldt-Universität zu Berlin 2 Abstract Until recently, most research
Shibboleth Configuration in Tübingen
Shibboleth Configuration in Tübingen Thomas Zastrow Yana Panchenko The university Tübingen is member of the DFN AAI The computing center in Tübingen runs a centralized IDP for the whole university In the
Dutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken [email protected] LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
Annotation in Language Documentation
Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE 2015-10-29 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations
How To Create A Clarin Metadata Infrastructure
Creating & Testing CLARIN Metadata Components Folkert de Vriend (1), Daan Broeder (2), Griet Depoorter (3), Laura van Eerten (3), Dieter van Uytvanck (2) 1) Meertens Institute Joan Muyskenweg 25, Amsterdam,
Developing Java Web Services
Page 1 of 5 Developing Java Web Services Hands On 35 Hours Online 5 Days In-Classroom A comprehensive look at the state of the art in developing interoperable web services on the Java EE platform. Students
ITS. Java WebService. ITS Data-Solutions Pvt Ltd BENEFITS OF ATTENDANCE:
Java WebService BENEFITS OF ATTENDANCE: PREREQUISITES: Upon completion of this course, students will be able to: Describe the interoperable web services architecture, including the roles of SOAP and WSDL.
Java Web Services Training
Java Web Services Training Duration: 5 days Class Overview A comprehensive look at the state of the art in developing interoperable web services on the Java EE 6 platform. Students learn the key standards
Ontology based Recruitment Process
Ontology based Recruitment Process Malgorzata Mochol Radoslaw Oldakowski Institut für Informatik AG Netzbasierte Informationssysteme Freie Universität Berlin Takustr. 9, 14195 Berlin, Germany [email protected]
JVA-561. Developing SOAP Web Services in Java
JVA-561. Developing SOAP Web Services in Java Version 2.2 A comprehensive look at the state of the art in developing interoperable web services on the Java EE 6 platform. Students learn the key standards
CorA: A web-based annotation tool for historical and other non-standard language data
CorA: A web-based annotation tool for historical and other non-standard language data Marcel Bollmann, Florian Petran, Stefanie Dipper, Julia Krasselt Department of Linguistics Ruhr-University Bochum,
LEXUS: a web based lexicon tool
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Content Max Planck Institute Archive of linguistic resources Tool support (archiving
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
WEB SERVICES. Revised 9/29/2015
WEB SERVICES Revised 9/29/2015 This Page Intentionally Left Blank Table of Contents Web Services using WebLogic... 1 Developing Web Services on WebSphere... 2 Developing RESTful Services in Java v1.1...
TEANLIS - Text Analysis for Literary Scholars
TEANLIS - Text Analysis for Literary Scholars Andreas Müller 1,3, Markus John 2,4, Jonas Kuhn 1,3 (1) Institut für Maschinelle Sprachverarbeitung Universität Stuttgart (2) Institut für Visualisierung und
technische universiteit eindhoven WIS & Engineering Geert-Jan Houben
WIS & Engineering Geert-Jan Houben Contents Web Information System (WIS) Evolution in Web data WIS Engineering Languages for Web data XML (context only!) RDF XML Querying: XQuery (context only!) RDFS SPARQL
ESS EA TF Item 2 Enterprise Architecture for the ESS
ESS EA TF Item 2 Enterprise Architecture for the ESS Document prepared by Eurostat (with the support of Gartner INC) 1.0 Introduction The members of the European Statistical System (ESS) have set up a
CLARIN project DiscAn :
CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora Ted Sanders Kirsten Vis Utrecht Institute of Linguistics Utrecht University Daan Broeder TLA Max-Planck Institute
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems
Integration of Hotel Property Management Systems (HPMS) with Global Internet Reservation Systems If company want to be competitive on global market nowadays, it have to be persistent on Internet. If we
What Does Interoperability Mean, Anyway? Toward an Operational Definition of Interoperability for Language Technology
What Does Interoperability Mean, Anyway? Toward an Operational Definition of Interoperability for Language Technology Nancy Ide Department of Computer Science Vassar College [email protected] James Pustejovsky
10CS73:Web Programming
10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
Integrating Annotation Tools into UIMA for Interoperability
Integrating Annotation Tools into UIMA for Interoperability Scott Piao, Sophia Ananiadou and John McNaught School of Computer Science & National Centre for Text Mining The University of Manchester UK {scott.piao;sophia.ananiadou;john.mcnaught}@manchester.ac.uk
Sustainable Solutions for Endangered Languages Data: The Language Archive
Charting Vanishing Voices: A Collaborative Workshop to Map Endangered Oral Cultures World Oral Literature Project 2012 Workshop CRASSH, Cambridge Sustainable Solutions for Endangered Languages Data: The
WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations Seid Muhie Yimam 1,3 Iryna Gurevych 2,3 Richard Eckart de Castilho 2 Chris Biemann 1 (1) FG Language Technology,
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus
Adding Value to CMC Corpora: CLARINification and Part-of-Speech Annotation of the Dortmund Chat Corpus Michael Beißwenger, Eric Ehrhardt, Andrea Horbach, Harald Lüngen, Diana Steffen, Angelika Storrer
CLARIN: Common Language Resources and Technology Infrastructure
CLARIN: Common Language Resources and Technology Infrastructure Tamás Váradi, Peter Wittenburg, Steven Krauwer, Martin Wynne, Kimmo Koskenniemi Hungarian Academy of Sciences (Budapest), MPI for Psycholinguistics
High Performance XML Data Retrieval
High Performance XML Data Retrieval Mark V. Scardina Jinyu Wang Group Product Manager & XML Evangelist Oracle Corporation Senior Product Manager Oracle Corporation Agenda Why XPath for Data Retrieval?
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
12 The Semantic Web and RDF
MSc in Communication Sciences 2011-12 Program in Technologies for Human Communication Davide Eynard nternet Technology 12 The Semantic Web and RDF 2 n the previous episodes... A (video) summary: Michael
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Survey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)
The Language Archive at the Max Planck Institute for Psycholinguistics Alexander König (with thanks to J. Ringersma) Fourth SLCN Workshop, Berlin, December 2010 Content 1.The Language Archive Why Archiving?
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS Ionela MANIU Lucian Blaga University Sibiu, Romania Faculty of Sciences [email protected] George MANIU Spiru Haret University Bucharest, Romania Faculty
GetFormatList. Webservice name: GetFormatList. Adress: https://www.elib.se/webservices/getformatlist.asmx
GetFormatList Webservice name: GetFormatList Adress: https://www.elib.se/webservices/getformatlist.asmx WSDL: https://www.elib.se/webservices/getformatlist.asmx?wsdl Webservice Methods: Name: GetFormatList
Lou Burnard Consulting 2014-06-21
Getting started with oxygen Lou Burnard Consulting 2014-06-21 1 Introducing oxygen In this first exercise we will use oxygen to : create a new XML document gradually add markup to the document carry out
Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov. 2008 [Folie 1]
Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds
GetLibraryUserOrderList
GetLibraryUserOrderList Webservice name: GetLibraryUserOrderList Adress: https://www.elib.se/webservices/getlibraryuserorderlist.asmx WSDL: https://www.elib.se/webservices/getlibraryuserorderlist.asmx?wsdl
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
Using the BNC to create and develop educational materials and a website for learners of English
Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University
Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde
Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde LREC 2012, Istanbul May 25, 2012 NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics September
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
A Conceptual Framework of Online Natural Language Processing Pipeline Application
A Conceptual Framework of Online Natural Language Processing Pipeline Application Chunqi Shi, Marc Verhagen, James Pustejovsky Brandeis University Waltham, United States {shicq, jamesp, marc}@cs.brandeis.edu
Machine Learning for natural language processing
Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
CoLang 2014 Data Management and Archiving Course. Session 2. Nick Thieberger University of Melbourne
CoLang 2014 Data Management and Archiving Course Session 2 Nick Thieberger University of Melbourne Quiz In a morning recording session you recorded two speakers, each telling a story, then recorded your
A Semantic web approach for e-learning platforms
A Semantic web approach for e-learning platforms Miguel B. Alves 1 1 Laboratório de Sistemas de Informação, ESTG-IPVC 4900-348 Viana do Castelo. [email protected] Abstract. When lecturers publish contents
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
PyCantonese: Cantonese linguistic research in the age of big data
PyCantonese: Cantonese linguistic research in the age of big data Jackson L. Lee University of Chicago http://jacksonllee.com Childhood Bilingualism Research Center, CUHK September 15, 2015 Grammar versus
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking
ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France [email protected] Cyril Grouin LIMSI-CNRS rue John von Neumann 91400
EPNML 1.1 - an XML format for Petri nets
EPNML 1.1 - an XML format for Petri nets J.M.E.M. van der Werf ([email protected]) R.D.J. Post ([email protected]) TU Eindhoven 21st June 2004 Abstract This document defines EPNML 1.1, an XML format used
XBRL Processor Interstage XWand and Its Application Programs
XBRL Processor Interstage XWand and Its Application Programs V Toshimitsu Suzuki (Manuscript received December 1, 2003) Interstage XWand is a middleware for Extensible Business Reporting Language (XBRL)
Making Content Easy to Find. DC2010 Pittsburgh, PA Betsy Fanning AIIM
Making Content Easy to Find DC2010 Pittsburgh, PA Betsy Fanning AIIM Who is AIIM? The leading industry association representing professionals working in Enterprise Content Management (ECM). We offer a
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim [email protected]
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
Database Design For Corpus Storage: The ET10-63 Data Model
January 1993 Database Design For Corpus Storage: The ET10-63 Data Model Tony McEnery & Béatrice Daille I. General Presentation Within the ET10-63 project, a French-English bilingual corpus of about 2 million
