Schema documentation for types1.2.xsd



Similar documents
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Semantic annotation of requirements for automatic UML class diagram generation

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Natural Language Processing in the EHR Lifecycle

Search and Information Retrieval

Natural Language to Relational Query by Using Parsing Compiler

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Shallow Parsing with Apache UIMA

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

Natural Language Processing

Technical Report. The KNIME Text Processing Feature:

FoLiA: Format for Linguistic Annotation

Natural Language Database Interface for the Community Based Monitoring System *

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Dutch Parallel Corpus

Interactive Dynamic Information Extraction

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF

WebLicht: Web-based LRT services for German

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

Customizing an English-Korean Machine Translation System for Patent Translation *

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Distributed Computing and Big Data: Hadoop and MapReduce

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Automated Extraction of Security Policies from Natural-Language Software Documents

Using the BNC to create and develop educational materials and a website for learners of English

Word Completion and Prediction in Hebrew

GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning

HL7 and DICOM based integration of radiology departments with healthcare enterprise information systems

Central and South-East European Resources in META-SHARE

Annotated Corpora in the Cloud: Free Storage and Free Delivery

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

Beginning Oracle. Application Express 4. Doug Gault. Timothy St. Hilaire. Karen Cannell. Martin D'Souza. Patrick Cimolini

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits

Computer Aided Document Indexing System

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174

Introduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange

EvilSeed: A Guided Approach to Finding Malicious Web Pages

A comprehensive guide to XML Sitemaps:

Terminology Extraction from Log Files

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Sitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension.

Ask your Database: Natural Language Processing using In-Memory Technology

Abstract 1. INTRODUCTION

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

An Online Service for SUbtitling by MAchine Translation

Introduction to Text Mining. Module 2: Information Extraction in GATE

How to make Ontologies self-building from Wiki-Texts

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it;

Data Deduplication in Slovak Corpora

vcloud Air Platform Programmer's Guide

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

Computer-aided Document Indexing System

DataDirect XQuery Technical Overview

Deposit Identification Utility and Visualization Tool

Micro blogs Oriented Word Segmentation System

HireDesk API V1.0 Developer s Guide

Clustering Connectionist and Statistical Language Processing

31 Case Studies: Java Natural Language Tools Available on the Web

Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

Observation Metadata and its Use in the DWD Weather Data Request Broker

Configuring DNS. Finding Feature Information

DB2 Web Query Interfaces

CENG 734 Advanced Topics in Bioinformatics

Semistructured data and XML. Institutt for Informatikk INF Ahmet Soylu

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Special Topics in Computer Science

Introduction to XML Applications

EFFECTIVE STORAGE OF XBRL DOCUMENTS

Transcription:

Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""........................................................................................................... 2 Schemas........................................................................................................................................... 2 Main schema types1.2.xsd.......................................................................................................... 2 Complex s................................................................................................................................... 2 Complex ns:params...................................................................................................... 2 Complex ns:sentencesplitterparams............................................................................. 2 Complex ns:sentencesplittermainparams....................................................................... 3 Complex ns:text.......................................................................................................... 3 Complex ns:tokenizerparams........................................................................................ 3 Complex ns:tokenizermainparams.................................................................................. 4 Complex ns:lemmatizerparams....................................................................................... 4 Complex ns:lemmatizermainparams................................................................................ 4 Complex ns:postaggerparams........................................................................................ 5 Complex ns:postaggermainparams.................................................................................. 5 Complex ns:shallowparserparams.................................................................................. 5 Complex ns:shallowparsermainparams............................................................................ 6 Complex ns:parserparams............................................................................................. 6 Complex ns:parsermainparams....................................................................................... 6 Complex ns:textalignerparams..................................................................................... 7 Complex ns:textalignermainparams............................................................................... 7 Complex ns:cqpindexerparams....................................................................................... 7 Complex ns:cqpindexermainparams................................................................................ 8 Complex ns:cqpcorpus.................................................................................................. 8 Complex ns:cqpquerierparams....................................................................................... 9 Complex ns:cqpqueriermainparams................................................................................ 9 Complex ns:concordancerparams................................................................................... 9 Complex ns:concordancermainparams............................................................................ 10 Complex ns:concordanceroptionalparams..................................................................... 10 Complex ns:ngramscooccurrencesparams....................................................................... 10 Complex ns:ngramscooccurrencesmainparams................................................................. 10 Complex ns:ngramscooccurrencesoptionalparams.......................................................... 11 Complex ns:crawlerparams.......................................................................................... 11 Complex ns:crawlermainparams.................................................................................... 11 Complex ns:crawleroptionalparams............................................................................. 12 Complex ns:maxsize.................................................................................................... 12 Complex ns:nerecognitionparams................................................................................. 12 Complex ns:nerecognitionmainparams.......................................................................... 13 Complex ns:nerecognitionoptionalparams.................................................................... 13 Complex ns:termextractionparams............................................................................... 13 Complex ns:termextractionmainparams......................................................................... 14 Complex ns:termextractionoptionalparams.................................................................. 14 Complex ns:topicidentifierparams............................................................................. 14 Complex ns:topicidentifiermainparams....................................................................... 15 Complex ns:topicidentifieroptionalparams................................................................. 15 Complex ns:ditionarylookupparams............................................................................. 15 Complex ns:dictionarylookupmainparams..................................................................... 16 Complex ns:response.................................................................................................. 16 Complex ns:sentencesplitterresponse......................................................................... 16 Complex ns:tokenizerresponse.................................................................................... 17 Complex ns:lemmatizerresponse.................................................................................. 17 Complex ns:postaggerresponse.................................................................................... 18 Complex ns:shallowparserresponse............................................................................. 18 Complex ns:parserresponse........................................................................................ 18 Complex ns:textalignerresponse................................................................................. 19 Complex ns:cqpindexerresponse.................................................................................. 19 Complex ns:cqpquerierresponse.................................................................................. 20 Complex ns:concordancerresponse............................................................................... 20 Complex ns:ngramscooccurrencesresponse.................................................................... 20 Complex ns:crawlerresponse....................................................................................... 21 Complex ns:nerecognitionresponse............................................................................. 21 Complex ns:termextractionresponse............................................................................ 22 Complex ns:topicidentifierresponse.......................................................................... 22 Complex ns:dictionarylookupresponse......................................................................... 22 1

Complex ns:lexicalentrieslist.................................................................................. Complex ns:lexicalentry............................................................................................ Complex ns:sourcetext............................................................................................... Complex ns:targettext............................................................................................... Simple s.................................................................................................................................... Simple ns:lang................................................................................................................. Simple ns:cqpcorpusstructure.................................................................................... Simple ns:cqpquery.................................................................................................... Simple ns:reference................................................................................................... Simple ns:searchquery............................................................................................... Simple ns:windowsize................................................................................................. Simple ns:orderby...................................................................................................... Simple ns:number....................................................................................................... Simple ns:threshold................................................................................................... Simple ns:associationmeassure................................................................................... Simple ns:domain....................................................................................................... Simple ns:termlist.................................................................................................... Simple ns:size........................................................................................................... Simple ns:sizeunit.................................................................................................... Simple ns:maxtime...................................................................................................... Simple ns:timeout...................................................................................................... Simple ns:retries...................................................................................................... Simple ns:inputformat............................................................................................... Simple ns:format....................................................................................................... Simple ns:outputformat.............................................................................................. Simple ns:nelist....................................................................................................... Simple ns:wordform.................................................................................................... Simple ns:lemma......................................................................................................... Simple ns:postag....................................................................................................... Simple ns:tagset....................................................................................................... Simple ns:term........................................................................................................... Simple ns:ne.............................................................................................................. : "" Schemas Main schema types1.2.xsd Properties attribute form default: unqualified element form default: qualified version: 1.2 Complex s Complex ns:params All Request messages derive from Params. Params may be Main params or Optional params. Main params are necessarily typed. Optional params may be left untyped and have default values. ns:mainparams{0,1}, ns:optparams{0,1} Complex ns:sentencesplitterparams 2 23 23 23 24 24 24 24 24 25 25 25 25 25 25 26 26 26 26 26 26 26 27 27 27 27 27 28 28 28 28 28 28

Payload message for a sentence splitter service. restriction of ns:params hierarchy ns:params ns:sentencesplitterparams Complex ns:sentencesplittermainparams Main Params in the payload message for a Sentence Splitter service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:text Anything written. Text can be passed as a or as a URI. Typically, Text is used for input parameters such as 'text to be analyzed' and output parameters such as 'analyzed text'. ns: ns:file ns:file, ns: Complex ns:tokenizerparams Payload message for a tokenizer service. 3

restriction of ns:params hierarchy ns:params ns:tokenizerparams Complex ns:tokenizermainparams Main Params in the payload message for a Tokenizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:lemmatizerparams Payload message for a lemmatizer service. restriction of ns:params hierarchy ns:params ns:lemmatizerparams Complex ns:lemmatizermainparams 4

Main Params in the payload message for a Lemmatizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:postaggerparams Payload message for a PoS tagger service. restriction of ns:params hierarchy ns:params ns:postaggerparams Complex ns:postaggermainparams Main Params in the payload message for a PoS Tagger service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:shallowparserparams Payload message for a Shallow Parser (or Chunking) service. 5

restriction of ns:params hierarchy ns:params ns:shallowparserparams Complex ns:shallowparsermainparams Main Params in the payload message for a Shallow Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:parserparams Payload message for a Parser service. restriction of ns:params hierarchy ns:params ns:parserparams Complex ns:parsermainparams 6

Main Params in the payload message for a Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:textalignerparams Payload message for a Text Alignment service. restriction of ns:params hierarchy ns:params ns:textalignerparams Complex ns:textalignermainparams Main Params in the payload message for a Text Aligner service. Includes language and corpus for both source and target. ns:source_language, ns:source_corpus, ns:target_language, ns:target_corpus ns:source_corpus, ns:source_language, ns:target_corpus, ns:target_language Complex ns:cqpindexerparams Payload message for a CWB service when indexing a corpus. 7

restriction of ns:params hierarchy ns:params ns:cqpindexerparams Complex ns:cqpindexermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes the CQP corpus to be index and the CQP structure of the corpsus. ns:cqpcorpus, ns:cqpcorpusstructure ns:cqpcorpus, ns:cqpcorpusstructure Complex ns:cqpcorpus A CQP corpus (following the standard verticalized CWB input format) restriction of ns:text hierarchy ns:text ns:cqpcorpus ns: ns:file ns:file, ns: 8

Complex ns:cqpquerierparams Payload message for a CWB service when querying a corpus. restriction of ns:params hierarchy ns:params ns:cqpquerierparams Complex ns:cqpqueriermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes a reference to the indexed corpus to be queried and the CQP query. ns:cqpquery, ns:corpusreference ns:cqpquery, ns:corpusreference Complex ns:concordancerparams Payload message for a concordancer (or KWIC) service. restriction of ns:params hierarchy ns:params ns:concordancerparams 9

Complex ns:concordancermainparams Main Params in the payload message for a Concordancer service. Includes the search query and the window size. ns:searchquery, ns:windowsize{0,1} ns:searchquery, ns:windowsize Complex ns:concordanceroptionalparams Optional parameters in the payload message for a Concordancer (or KWIC) service. ALL(ns:OrderBy{0,1}) ns:orderby Complex ns:ngramscooccurrencesparams Payload message for a n-gram (or collocation finder) service. restriction of ns:params hierarchy ns:params ns:ngramscooccurrencesparams Complex ns:ngramscooccurrencesmainparams Main Params in the payload message for a n-gram (or collocation finder) service. Includes the corpus to be analyzed and the Number of tokens in the n-grams. 10

ns:corpus, ns:n ns:corpus, ns:n Complex ns:ngramscooccurrencesoptionalparams Optional parameters in the payload message for a n-gram (or collocation finder) service. ALL(ns:WindowSize{0,1} ns:frequencythreshold{0,1} ns:rankthreshold{0,1} ns:scorethreshold{0,1} ns:associationmeassure{0,1}) ns:associationmeassure, ns:frequencythreshold, ns:rankthreshold, ns:scorethreshold, ns:windowsize Complex ns:crawlerparams Payload message for a crawler service. restriction of ns:params hierarchy ns:params ns:crawlerparams Complex ns:crawlermainparams Main Params in the payload message for crawler service. 11

ns:language+, ns:domain, ns:url+, ns:termlist ns:domain, ns:language, ns:termlist, ns:url Complex ns:crawleroptionalparams Optional parameters in the payload message for a crawler service. It includes: maxsize and maxtime as criteria to stop the crawling process and timeout and retries for flow control purposes ALL(ns:maxSize{0,1} ns:maxtime{0,1} ns:timeout{0,1} ns:retries{0,1}) ns:maxsize, ns:maxtime, ns:retries, ns:timeout Complex ns:maxsize Maximum amount od data. It is expressed in terms of size (amount in numbers) and sizeunit (unit) ns:size, ns:sizeunit ns:size, ns:sizeunit Complex ns:nerecognitionparams Payload message for a Named Entity recognition service. 12

restriction of ns:params hierarchy ns:params ns:nerecognitionparams Complex ns:nerecognitionmainparams Main Params in the payload message for a Named Entity recognition service. ns:language, ns:input ns:input, ns:language Complex ns:nerecognitionoptionalparams Optional parameters in the payload message for a Named Entity recognition service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:nes{0,1}) ns:nes, ns:inputformat, ns:outputformat Complex ns:termextractionparams Payload message for a term extraction service. 13

restriction of ns:params hierarchy ns:params ns:termextractionparams Complex ns:termextractionmainparams Main Params in the payload message for a term extraction service. ns:language, ns:input ns:input, ns:language Complex ns:termextractionoptionalparams Optional parameters in the payload message for a term extraction service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:frequencythreshold{0,1} ns:stoplist{0,1}) ns:frequencythreshold, ns:inputformat, ns:outputformat, ns:stoplist Complex ns:topicidentifierparams Payload message for a term extraction service. 14

restriction of ns:params hierarchy ns:params ns:topicidentifierparams Complex ns:topicidentifiermainparams Main Params in the payload message for a topic identifier service. ns:language, ns:input ns:input, ns:language Complex ns:topicidentifieroptionalparams Optional parameters in the payload message for a topic identifier service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1}) ns:inputformat, ns:outputformat Complex ns:ditionarylookupparams Payload message for a dictionary lookup service. 15

restriction of ns:params hierarchy ns:params ns:ditionarylookupparams Complex ns:dictionarylookupmainparams Main Params in the payload message for a dictionary lookup service. ns:language, ns:wordform ns:language, ns:wordform Complex ns:response Complex ns:sentencesplitterresponse 16

extension of ns:response hierarchy ns:response ns:sentencesplitterresponse ns:sentencesplittedtext ns:sentencesplittedtext Complex ns:tokenizerresponse extension of ns:response hierarchy ns:response ns:tokenizerresponse ns:tokenizedtext ns:tokenizedtext Complex ns:lemmatizerresponse 17

extension of ns:response hierarchy ns:response ns:lemmatizedtext ns:lemmatizedtext ns:lemmatizerresponse Complex ns:postaggerresponse extension of ns:response hierarchy ns:response ns:postaggerresponse ns:posannotatedtext ns:posannotatedtext Complex ns:shallowparserresponse extension of ns:response hierarchy ns:response ns:shallowparserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:parserresponse 18

extension of ns:response hierarchy ns:response ns:parserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:textalignerresponse extension of ns:response hierarchy ns:response ns:textalignerresponse ns:alignedtext ns:alignedtext Complex ns:cqpindexerresponse 19

extension of ns:response hierarchy ns:response ns:reference ns:reference ns:cqpindexerresponse Complex ns:cqpquerierresponse extension of ns:response hierarchy ns:response ns:cqpquerierresponse ns:concordances ns:concordances Complex ns:concordancerresponse extension of ns:response hierarchy ns:response ns:concordancerresponse ns:concordances ns:concordances Complex ns:ngramscooccurrencesresponse 20

extension of ns:response hierarchy ns:response ns:ngramscooccurrencesresponse ns:ngramscoocurrences ns:ngramscoocurrences Complex ns:crawlerresponse extension of ns:response hierarchy ns:response ns:crawlerresponse ns:webcorpus ns:webcorpus Complex ns:nerecognitionresponse 21

extension of ns:response hierarchy ns:response ns:neannotatedtext ns:neannotatedtext ns:nerecognitionresponse Complex ns:termextractionresponse extension of ns:response hierarchy ns:response ns:termextractionresponse ns:termannotatedtext ns:termannotatedtext Complex ns:topicidentifierresponse extension of ns:response hierarchy ns:response ns:topicidentifierresponse ns:topicannotatedtext ns:topicannotatedtext Complex ns:dictionarylookupresponse 22

extension of ns:response hierarchy ns:response ns:dictionarylookupresponse ns:lexicalentrieslist ns:lexicalentrieslist Complex ns:lexicalentrieslist ns:lexicalentry+ ns:lexicalentry Complex ns:lexicalentry Expressed in terms of lemma and PoS tag ns:lemma, ns:postag ns:postag, ns:lemma Complex ns:sourcetext The source text (for example in an alignment task) 23

restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:sourcetext Complex ns:targettext The target text of a service that requires a source and a target input files (for example in an alignment task) restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:targettext Simple s Simple ns:lang Language iso-639 restriction of Simple ns:cqpcorpusstructure The structure used to to encode the verticalized text to CWB binary format with the cwb-encode tool. (ex. "-P pos -P lemma -S s") Simple ns:cqpquery A CQP query. 24

Simple ns:reference The reference that a WS returns which can be used in future events. Simple ns:searchquery A search query. Simple ns:windowsize Size of window, typically for concordancers and collocation analyses int Simple ns:orderby The sorting criteria to be used when dispalying results. (Possible more values should be added) restriction of Simple ns:number int Simple ns:threshold The level or point at which something would happen, would cease to happen, or would take effect, become true, etc. int 25

Simple ns:associationmeassure The measure of the link between two variables. restriction of Simple ns:domain http://www.isocat.org/datcat/dc-2467 Simple ns:termlist http://www.isocat.org/datcat/dc-508: "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. list of ns:term Simple ns:size http://www.isocat.org/datcat/dc-2580: "The size of the resource with regard to the SizeUnit measurement in form of a number." Simple ns:sizeunit http://www.isocat.org/datcat/dc-2583: "Specification of the unit of size that is used when specifying the size." Simple ns:maxtime Maximun duration. Simple ns:timeout 26

Simple ns:retries Simple ns:inputformat The format required for a given Input parameter ns:format hierarchy ns:format ns:inputformat Simple ns:format http://www.isocat.org/datcat/dc-2562. The format of a given I/O parameter (ex: xml, html, verticalised,...) Simple ns:outputformat The format of a given Output parameter ns:format hierarchy ns:format ns:outputformat Simple ns:nelist List of Named Entities (http://www.isocat.org/datcat/dc-2275). list of ns:ne 27

Simple ns:wordform Simple ns:lemma http://www.isocat.org/datcat/dc-286 : The base form of a word or term that is used as the formal dictionary entry for the term. Simple ns:postag http://www.isocat.org/datcat/dc-396 A category assigned to a word based on its grammatical and semantic properties. Simple ns:tagset http://www.isocat.org/datcat/dc-2497: "Specifies the tag set used in the annotation of the resource or a used by the tool/service or it contains a URL that points to the information about the tag set" Simple ns:term http://www.isocat.org/datcat/dc-508: "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. Simple ns:ne http://www.isocat.org/datcat/dc-2275: "segment of text for which one or many rigid designators stands for the referent. usually named entities are located and classified into predefined types such as names of person, organizations, locations, expressions of times etc.". 28

29