Schema documentation for types1.2.xsd
|
|
- Blanche Bates
- 8 years ago
- Views:
Transcription
1 Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : "" Schemas Main schema types1.2.xsd Complex s Complex ns:params Complex ns:sentencesplitterparams Complex ns:sentencesplittermainparams Complex ns:text Complex ns:tokenizerparams Complex ns:tokenizermainparams Complex ns:lemmatizerparams Complex ns:lemmatizermainparams Complex ns:postaggerparams Complex ns:postaggermainparams Complex ns:shallowparserparams Complex ns:shallowparsermainparams Complex ns:parserparams Complex ns:parsermainparams Complex ns:textalignerparams Complex ns:textalignermainparams Complex ns:cqpindexerparams Complex ns:cqpindexermainparams Complex ns:cqpcorpus Complex ns:cqpquerierparams Complex ns:cqpqueriermainparams Complex ns:concordancerparams Complex ns:concordancermainparams Complex ns:concordanceroptionalparams Complex ns:ngramscooccurrencesparams Complex ns:ngramscooccurrencesmainparams Complex ns:ngramscooccurrencesoptionalparams Complex ns:crawlerparams Complex ns:crawlermainparams Complex ns:crawleroptionalparams Complex ns:maxsize Complex ns:nerecognitionparams Complex ns:nerecognitionmainparams Complex ns:nerecognitionoptionalparams Complex ns:termextractionparams Complex ns:termextractionmainparams Complex ns:termextractionoptionalparams Complex ns:topicidentifierparams Complex ns:topicidentifiermainparams Complex ns:topicidentifieroptionalparams Complex ns:ditionarylookupparams Complex ns:dictionarylookupmainparams Complex ns:response Complex ns:sentencesplitterresponse Complex ns:tokenizerresponse Complex ns:lemmatizerresponse Complex ns:postaggerresponse Complex ns:shallowparserresponse Complex ns:parserresponse Complex ns:textalignerresponse Complex ns:cqpindexerresponse Complex ns:cqpquerierresponse Complex ns:concordancerresponse Complex ns:ngramscooccurrencesresponse Complex ns:crawlerresponse Complex ns:nerecognitionresponse Complex ns:termextractionresponse Complex ns:topicidentifierresponse Complex ns:dictionarylookupresponse
2 Complex ns:lexicalentrieslist Complex ns:lexicalentry Complex ns:sourcetext Complex ns:targettext Simple s Simple ns:lang Simple ns:cqpcorpusstructure Simple ns:cqpquery Simple ns:reference Simple ns:searchquery Simple ns:windowsize Simple ns:orderby Simple ns:number Simple ns:threshold Simple ns:associationmeassure Simple ns:domain Simple ns:termlist Simple ns:size Simple ns:sizeunit Simple ns:maxtime Simple ns:timeout Simple ns:retries Simple ns:inputformat Simple ns:format Simple ns:outputformat Simple ns:nelist Simple ns:wordform Simple ns:lemma Simple ns:postag Simple ns:tagset Simple ns:term Simple ns:ne : "" Schemas Main schema types1.2.xsd Properties attribute form default: unqualified element form default: qualified version: 1.2 Complex s Complex ns:params All Request messages derive from Params. Params may be Main params or Optional params. Main params are necessarily typed. Optional params may be left untyped and have default values. ns:mainparams{0,1}, ns:optparams{0,1} Complex ns:sentencesplitterparams
3 Payload message for a sentence splitter service. restriction of ns:params hierarchy ns:params ns:sentencesplitterparams Complex ns:sentencesplittermainparams Main Params in the payload message for a Sentence Splitter service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:text Anything written. Text can be passed as a or as a URI. Typically, Text is used for input parameters such as 'text to be analyzed' and output parameters such as 'analyzed text'. ns: ns:file ns:file, ns: Complex ns:tokenizerparams Payload message for a tokenizer service. 3
4 restriction of ns:params hierarchy ns:params ns:tokenizerparams Complex ns:tokenizermainparams Main Params in the payload message for a Tokenizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:lemmatizerparams Payload message for a lemmatizer service. restriction of ns:params hierarchy ns:params ns:lemmatizerparams Complex ns:lemmatizermainparams 4
5 Main Params in the payload message for a Lemmatizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:postaggerparams Payload message for a PoS tagger service. restriction of ns:params hierarchy ns:params ns:postaggerparams Complex ns:postaggermainparams Main Params in the payload message for a PoS Tagger service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:shallowparserparams Payload message for a Shallow Parser (or Chunking) service. 5
6 restriction of ns:params hierarchy ns:params ns:shallowparserparams Complex ns:shallowparsermainparams Main Params in the payload message for a Shallow Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:parserparams Payload message for a Parser service. restriction of ns:params hierarchy ns:params ns:parserparams Complex ns:parsermainparams 6
7 Main Params in the payload message for a Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:textalignerparams Payload message for a Text Alignment service. restriction of ns:params hierarchy ns:params ns:textalignerparams Complex ns:textalignermainparams Main Params in the payload message for a Text Aligner service. Includes language and corpus for both source and target. ns:source_language, ns:source_corpus, ns:target_language, ns:target_corpus ns:source_corpus, ns:source_language, ns:target_corpus, ns:target_language Complex ns:cqpindexerparams Payload message for a CWB service when indexing a corpus. 7
8 restriction of ns:params hierarchy ns:params ns:cqpindexerparams Complex ns:cqpindexermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes the CQP corpus to be index and the CQP structure of the corpsus. ns:cqpcorpus, ns:cqpcorpusstructure ns:cqpcorpus, ns:cqpcorpusstructure Complex ns:cqpcorpus A CQP corpus (following the standard verticalized CWB input format) restriction of ns:text hierarchy ns:text ns:cqpcorpus ns: ns:file ns:file, ns: 8
9 Complex ns:cqpquerierparams Payload message for a CWB service when querying a corpus. restriction of ns:params hierarchy ns:params ns:cqpquerierparams Complex ns:cqpqueriermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes a reference to the indexed corpus to be queried and the CQP query. ns:cqpquery, ns:corpusreference ns:cqpquery, ns:corpusreference Complex ns:concordancerparams Payload message for a concordancer (or KWIC) service. restriction of ns:params hierarchy ns:params ns:concordancerparams 9
10 Complex ns:concordancermainparams Main Params in the payload message for a Concordancer service. Includes the search query and the window size. ns:searchquery, ns:windowsize{0,1} ns:searchquery, ns:windowsize Complex ns:concordanceroptionalparams Optional parameters in the payload message for a Concordancer (or KWIC) service. ALL(ns:OrderBy{0,1}) ns:orderby Complex ns:ngramscooccurrencesparams Payload message for a n-gram (or collocation finder) service. restriction of ns:params hierarchy ns:params ns:ngramscooccurrencesparams Complex ns:ngramscooccurrencesmainparams Main Params in the payload message for a n-gram (or collocation finder) service. Includes the corpus to be analyzed and the Number of tokens in the n-grams. 10
11 ns:corpus, ns:n ns:corpus, ns:n Complex ns:ngramscooccurrencesoptionalparams Optional parameters in the payload message for a n-gram (or collocation finder) service. ALL(ns:WindowSize{0,1} ns:frequencythreshold{0,1} ns:rankthreshold{0,1} ns:scorethreshold{0,1} ns:associationmeassure{0,1}) ns:associationmeassure, ns:frequencythreshold, ns:rankthreshold, ns:scorethreshold, ns:windowsize Complex ns:crawlerparams Payload message for a crawler service. restriction of ns:params hierarchy ns:params ns:crawlerparams Complex ns:crawlermainparams Main Params in the payload message for crawler service. 11
12 ns:language+, ns:domain, ns:url+, ns:termlist ns:domain, ns:language, ns:termlist, ns:url Complex ns:crawleroptionalparams Optional parameters in the payload message for a crawler service. It includes: maxsize and maxtime as criteria to stop the crawling process and timeout and retries for flow control purposes ALL(ns:maxSize{0,1} ns:maxtime{0,1} ns:timeout{0,1} ns:retries{0,1}) ns:maxsize, ns:maxtime, ns:retries, ns:timeout Complex ns:maxsize Maximum amount od data. It is expressed in terms of size (amount in numbers) and sizeunit (unit) ns:size, ns:sizeunit ns:size, ns:sizeunit Complex ns:nerecognitionparams Payload message for a Named Entity recognition service. 12
13 restriction of ns:params hierarchy ns:params ns:nerecognitionparams Complex ns:nerecognitionmainparams Main Params in the payload message for a Named Entity recognition service. ns:language, ns:input ns:input, ns:language Complex ns:nerecognitionoptionalparams Optional parameters in the payload message for a Named Entity recognition service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:nes{0,1}) ns:nes, ns:inputformat, ns:outputformat Complex ns:termextractionparams Payload message for a term extraction service. 13
14 restriction of ns:params hierarchy ns:params ns:termextractionparams Complex ns:termextractionmainparams Main Params in the payload message for a term extraction service. ns:language, ns:input ns:input, ns:language Complex ns:termextractionoptionalparams Optional parameters in the payload message for a term extraction service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:frequencythreshold{0,1} ns:stoplist{0,1}) ns:frequencythreshold, ns:inputformat, ns:outputformat, ns:stoplist Complex ns:topicidentifierparams Payload message for a term extraction service. 14
15 restriction of ns:params hierarchy ns:params ns:topicidentifierparams Complex ns:topicidentifiermainparams Main Params in the payload message for a topic identifier service. ns:language, ns:input ns:input, ns:language Complex ns:topicidentifieroptionalparams Optional parameters in the payload message for a topic identifier service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1}) ns:inputformat, ns:outputformat Complex ns:ditionarylookupparams Payload message for a dictionary lookup service. 15
16 restriction of ns:params hierarchy ns:params ns:ditionarylookupparams Complex ns:dictionarylookupmainparams Main Params in the payload message for a dictionary lookup service. ns:language, ns:wordform ns:language, ns:wordform Complex ns:response Complex ns:sentencesplitterresponse 16
17 extension of ns:response hierarchy ns:response ns:sentencesplitterresponse ns:sentencesplittedtext ns:sentencesplittedtext Complex ns:tokenizerresponse extension of ns:response hierarchy ns:response ns:tokenizerresponse ns:tokenizedtext ns:tokenizedtext Complex ns:lemmatizerresponse 17
18 extension of ns:response hierarchy ns:response ns:lemmatizedtext ns:lemmatizedtext ns:lemmatizerresponse Complex ns:postaggerresponse extension of ns:response hierarchy ns:response ns:postaggerresponse ns:posannotatedtext ns:posannotatedtext Complex ns:shallowparserresponse extension of ns:response hierarchy ns:response ns:shallowparserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:parserresponse 18
19 extension of ns:response hierarchy ns:response ns:parserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:textalignerresponse extension of ns:response hierarchy ns:response ns:textalignerresponse ns:alignedtext ns:alignedtext Complex ns:cqpindexerresponse 19
20 extension of ns:response hierarchy ns:response ns:reference ns:reference ns:cqpindexerresponse Complex ns:cqpquerierresponse extension of ns:response hierarchy ns:response ns:cqpquerierresponse ns:concordances ns:concordances Complex ns:concordancerresponse extension of ns:response hierarchy ns:response ns:concordancerresponse ns:concordances ns:concordances Complex ns:ngramscooccurrencesresponse 20
21 extension of ns:response hierarchy ns:response ns:ngramscooccurrencesresponse ns:ngramscoocurrences ns:ngramscoocurrences Complex ns:crawlerresponse extension of ns:response hierarchy ns:response ns:crawlerresponse ns:webcorpus ns:webcorpus Complex ns:nerecognitionresponse 21
22 extension of ns:response hierarchy ns:response ns:neannotatedtext ns:neannotatedtext ns:nerecognitionresponse Complex ns:termextractionresponse extension of ns:response hierarchy ns:response ns:termextractionresponse ns:termannotatedtext ns:termannotatedtext Complex ns:topicidentifierresponse extension of ns:response hierarchy ns:response ns:topicidentifierresponse ns:topicannotatedtext ns:topicannotatedtext Complex ns:dictionarylookupresponse 22
23 extension of ns:response hierarchy ns:response ns:dictionarylookupresponse ns:lexicalentrieslist ns:lexicalentrieslist Complex ns:lexicalentrieslist ns:lexicalentry+ ns:lexicalentry Complex ns:lexicalentry Expressed in terms of lemma and PoS tag ns:lemma, ns:postag ns:postag, ns:lemma Complex ns:sourcetext The source text (for example in an alignment task) 23
24 restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:sourcetext Complex ns:targettext The target text of a service that requires a source and a target input files (for example in an alignment task) restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:targettext Simple s Simple ns:lang Language iso-639 restriction of Simple ns:cqpcorpusstructure The structure used to to encode the verticalized text to CWB binary format with the cwb-encode tool. (ex. "-P pos -P lemma -S s") Simple ns:cqpquery A CQP query. 24
25 Simple ns:reference The reference that a WS returns which can be used in future events. Simple ns:searchquery A search query. Simple ns:windowsize Size of window, typically for concordancers and collocation analyses int Simple ns:orderby The sorting criteria to be used when dispalying results. (Possible more values should be added) restriction of Simple ns:number int Simple ns:threshold The level or point at which something would happen, would cease to happen, or would take effect, become true, etc. int 25
26 Simple ns:associationmeassure The measure of the link between two variables. restriction of Simple ns:domain Simple ns:termlist "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. list of ns:term Simple ns:size "The size of the resource with regard to the SizeUnit measurement in form of a number." Simple ns:sizeunit "Specification of the unit of size that is used when specifying the size." Simple ns:maxtime Maximun duration. Simple ns:timeout 26
27 Simple ns:retries Simple ns:inputformat The format required for a given Input parameter ns:format hierarchy ns:format ns:inputformat Simple ns:format The format of a given I/O parameter (ex: xml, html, verticalised,...) Simple ns:outputformat The format of a given Output parameter ns:format hierarchy ns:format ns:outputformat Simple ns:nelist List of Named Entities ( list of ns:ne 27
28 Simple ns:wordform Simple ns:lemma : The base form of a word or term that is used as the formal dictionary entry for the term. Simple ns:postag A category assigned to a word based on its grammatical and semantic properties. Simple ns:tagset "Specifies the tag set used in the annotation of the resource or a used by the tool/service or it contains a URL that points to the information about the tag set" Simple ns:term "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. Simple ns:ne "segment of text for which one or many rigid designators stands for the referent. usually named entities are located and classified into predefined types such as names of person, organizations, locations, expressions of times etc.". 28
29 29
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationIntroduction to IE with GATE
Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation
More informationA Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow
A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationNatural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationDK-CLARIN WP 2.1 Technical Report Jørg Asmussen, DSL, with input from other WP 2 members Final version of May 5, 2015 1
Text formatting What an annotated text should look like DK-CLARIN WP 2.1 Technical Report Jørg Asmussen, DSL, with input from other WP 2 members Final version of May 5, 2015 1 Deliverables concerned D2
More informationMotivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
More informationShallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
More informationA Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
More informationNatural Language Processing
Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationFoLiA: Format for Linguistic Annotation
Maarten van Gompel Radboud University Nijmegen 20-01-2012 Introduction Introduction What is FoLiA? Generalised XML-based format for a wide variety of linguistic annotation Characteristics Generalised paradigm
More informationNatural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationDutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationDie Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF
Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?
More informationWebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen firstname.lastname@uni-tuebingen.de Abstract This software
More informationUIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis
UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics
More informationBridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
More informationLegal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
More informationDr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)
HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India anuangra@yahoo.com http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationA Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation
A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation Shiyan Ou, Christopher S.G. Khoo, and Dion H. Goh Division of Information Studies School
More informationOff-line (and On-line) Text Analysis for Computational Lexicography
Offline (and Online) Text Analysis for Computational Lexicography Von der PhilosophischHistorischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.)
More informationAutomated Extraction of Security Policies from Natural-Language Software Documents
Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,
More informationUsing the BNC to create and develop educational materials and a website for learners of English
Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationA prototype infrastructure for D Spin Services based on a flexible multilayer architecture
A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationGRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning
PACLIC 24 Proceedings 357 GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning Mei-hua Chen a, Chung-chi Huang a, Shih-ting Huang b, and Jason S. Chang b a Institute of Information
More informationHL7 and DICOM based integration of radiology departments with healthcare enterprise information systems
international journal of medical informatics 76S (2007) S425 S432 journal homepage: www.intl.elsevierhealth.com/journals/ijmi HL7 and DICOM based integration of radiology departments with healthcare enterprise
More informationCentral and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
More informationAnnotated Corpora in the Cloud: Free Storage and Free Delivery
Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki graham.wilcock@helsinki.fi Abstract The paper describes a technical strategy for implementing natural
More informationThe PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History
More informationBeginning Oracle. Application Express 4. Doug Gault. Timothy St. Hilaire. Karen Cannell. Martin D'Souza. Patrick Cimolini
Beginning Oracle Application Express 4 Doug Gault Karen Cannell Patrick Cimolini Martin D'Souza Timothy St. Hilaire Contents at a Glance About the Authors Acknowledgments iv xv xvil 0 Chapter 1: An Introduction
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationBerlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
More informationCombining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1
Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1 Maria Teresa Pazienza, Armando Stellato and Michele Vindigni Department of Computer Science, Systems and Management,
More informationSharePoint Server 2010 Capacity Management: Software Boundaries and Limits
SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,
More informationComputer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
More informationA Cross-Lingual Pattern Retrieval Framework
A Cross-Lingual Pattern Retrieval Framework Mei-Hua Chen, Chung-Chi Huang, Shih-Ting Huang, Hsien-Chin Liou, and Jason S. Chang Abstract We introduce a method for learning to grammatically categorize and
More informationStandard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata
Standard for Information and Image Management Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Association for Information and
More informationGrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens
More informationWeb-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
More informationIndex. AdWords, 182 AJAX Cart, 129 Attribution, 174
Index A AdWords, 182 AJAX Cart, 129 Attribution, 174 B BigQuery, Big Data Analysis create reports, 238 GA-BigQuery integration, 238 GA data, 241 hierarchy structure, 238 query language (see also Data selection,
More informationIntroduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange
Introduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange SIF 3.0 Infrastructure Goals of the Release Environment types & types Registration & Security Basic Architectural
More informationEvilSeed: A Guided Approach to Finding Malicious Web Pages
EvilSeed: A Guided Approach to Finding Malicious Web Pages L. Invernizzi 1 S. Benvenuti 2 M. Cova 3,5 P. Milani Comparetti 4,5 C. Kruegel 1 G. Vigna 1 1 UC Santa Barbara 2 University of Genova 3 University
More informationA comprehensive guide to XML Sitemaps:
s emperpl ugi ns. com A comprehensive guide to XML Sitemaps: What are they? Why do I need one? And how do I create one? A little background and history A sitemap is a way of collecting and displaying the
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationSitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/sitemap.
Sitemap Component for Joomla! This manual documents version 3.15.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/sitemap.html Contents 1 Introduction 3 2 Sitemap Features 3 3 Technical
More informationAsk your Database: Natural Language Processing using In-Memory Technology
Enterprise Platform and Integration Concepts Master Project Summer Term 2015 Ask your Database: Natural Language Processing using In-Memory Technology Dr. Mariana Neves April 10th, 2015 Question Answering
More informationAbstract 1. INTRODUCTION
A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es
More informationARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH
Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING
More informationOnline Multilingual Translation of Technical Service Reports over the World Wide Web
Online Multilingual Translation of Technical Service Reports over the World Wide Web S. Liu, S.C. Hui, S. Foo and P.C. Leong School of Applied Science, Nanyang Technological University Nanyang Avenue,
More informationTimeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?
Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam
More informationIntroduction to Text Mining. Module 2: Information Extraction in GATE
Introduction to Text Mining Module 2: Information Extraction in GATE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence
More informationHow to make Ontologies self-building from Wiki-Texts
How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.
More informationJAVA r VOLUME II-ADVANCED FEATURES. e^i v it;
..ui. : ' :>' JAVA r VOLUME II-ADVANCED FEATURES EIGHTH EDITION 'r.", -*U'.- I' -J L."'.!'.;._ ii-.ni CAY S. HORSTMANN GARY CORNELL It.. 1 rlli!>*-
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationvcloud Air Platform Programmer's Guide
vcloud Air Platform Programmer's Guide vcloud Air OnDemand 5.7 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition.
More informationAN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS
Distr. GENERAL Working Paper No.2 26 April 2007 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL
More informationComputer-aided Document Indexing System
Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous
More informationProject 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker
CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation
More informationDataDirect XQuery Technical Overview
DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4
More informationDeposit Identification Utility and Visualization Tool
Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in
More informationMicro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
More informationHireDesk API V1.0 Developer s Guide
HireDesk API V1.0 Developer s Guide Revision 1.4 Talent Technology Corporation Page 1 Audience This document is intended for anyone who wants to understand, and use the Hiredesk API. If you just want to
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationAnotaciones semánticas: unidades de busqueda del futuro?
Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding
More informationAutomated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes
Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes Ingrid Andås Berg Healthcare Informatics Submission date: March 2014 Supervisor: Øystein Nytrø, IDI
More informationEXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language
EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de
More informationTrameur: A Framework for Annotated Text Corpora Exploration
Trameur: A Framework for Annotated Text Corpora Exploration Serge Fleury (Sorbonne Nouvelle Paris 3) serge.fleury@univ-paris3.fr Maria Zimina(Paris Diderot Sorbonne Paris Cité) maria.zimina@eila.univ-paris-diderot.fr
More informationObservation Metadata and its Use in the DWD Weather Data Request Broker
Observation Metadata and its Use in the DWD Weather Data Request Broker Jürgen Seib Deutscher Wetterdienst e-mail: juergen.seib@dwd.de What kind of metadata is needed for the discovery of observation data?
More informationConfiguring DNS. Finding Feature Information
The Domain Name System (DNS) is a distributed database in which you can map hostnames to IP addresses through the DNS protocol from a DNS server. Each unique IP address can have an associated hostname.
More informationDB2 Web Query Interfaces
DB2 Web Query Interfaces There are several different access methods within DB2 Web Query and their related products. Here is a brief summary of the various interface and access methods. Method: DB2 Web
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationSemistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu
Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format
More informationSense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More informationInteroperability, Standards and Open Advancement
Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations
More informationAtigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy
Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC
More informationIntroduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
More informationD-SPIN. Volker Boehlke, Gerhard Heyer University of Leipzig. - overview, prototype, roadmap - Institut für Informatik
D-SPIN - overview, prototype, roadmap - Volker Boehlke, Gerhard Heyer University of Leipzig Institut für Informatik LeipzigLinguisticServices - repository: - relational database - used fields are: ID,
More informationEFFECTIVE STORAGE OF XBRL DOCUMENTS
EFFECTIVE STORAGE OF XBRL DOCUMENTS An Oracle & UBmatrix Whitepaper June 2007 Page 1 Introduction Today s business world requires the ability to report, validate, and analyze business information efficiently,
More information