Schema documentation for types1.2.xsd

Size: px
Start display at page:

Download "Schema documentation for types1.2.xsd"

Transcription

1 Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : "" Schemas Main schema types1.2.xsd Complex s Complex ns:params Complex ns:sentencesplitterparams Complex ns:sentencesplittermainparams Complex ns:text Complex ns:tokenizerparams Complex ns:tokenizermainparams Complex ns:lemmatizerparams Complex ns:lemmatizermainparams Complex ns:postaggerparams Complex ns:postaggermainparams Complex ns:shallowparserparams Complex ns:shallowparsermainparams Complex ns:parserparams Complex ns:parsermainparams Complex ns:textalignerparams Complex ns:textalignermainparams Complex ns:cqpindexerparams Complex ns:cqpindexermainparams Complex ns:cqpcorpus Complex ns:cqpquerierparams Complex ns:cqpqueriermainparams Complex ns:concordancerparams Complex ns:concordancermainparams Complex ns:concordanceroptionalparams Complex ns:ngramscooccurrencesparams Complex ns:ngramscooccurrencesmainparams Complex ns:ngramscooccurrencesoptionalparams Complex ns:crawlerparams Complex ns:crawlermainparams Complex ns:crawleroptionalparams Complex ns:maxsize Complex ns:nerecognitionparams Complex ns:nerecognitionmainparams Complex ns:nerecognitionoptionalparams Complex ns:termextractionparams Complex ns:termextractionmainparams Complex ns:termextractionoptionalparams Complex ns:topicidentifierparams Complex ns:topicidentifiermainparams Complex ns:topicidentifieroptionalparams Complex ns:ditionarylookupparams Complex ns:dictionarylookupmainparams Complex ns:response Complex ns:sentencesplitterresponse Complex ns:tokenizerresponse Complex ns:lemmatizerresponse Complex ns:postaggerresponse Complex ns:shallowparserresponse Complex ns:parserresponse Complex ns:textalignerresponse Complex ns:cqpindexerresponse Complex ns:cqpquerierresponse Complex ns:concordancerresponse Complex ns:ngramscooccurrencesresponse Complex ns:crawlerresponse Complex ns:nerecognitionresponse Complex ns:termextractionresponse Complex ns:topicidentifierresponse Complex ns:dictionarylookupresponse

2 Complex ns:lexicalentrieslist Complex ns:lexicalentry Complex ns:sourcetext Complex ns:targettext Simple s Simple ns:lang Simple ns:cqpcorpusstructure Simple ns:cqpquery Simple ns:reference Simple ns:searchquery Simple ns:windowsize Simple ns:orderby Simple ns:number Simple ns:threshold Simple ns:associationmeassure Simple ns:domain Simple ns:termlist Simple ns:size Simple ns:sizeunit Simple ns:maxtime Simple ns:timeout Simple ns:retries Simple ns:inputformat Simple ns:format Simple ns:outputformat Simple ns:nelist Simple ns:wordform Simple ns:lemma Simple ns:postag Simple ns:tagset Simple ns:term Simple ns:ne : "" Schemas Main schema types1.2.xsd Properties attribute form default: unqualified element form default: qualified version: 1.2 Complex s Complex ns:params All Request messages derive from Params. Params may be Main params or Optional params. Main params are necessarily typed. Optional params may be left untyped and have default values. ns:mainparams{0,1}, ns:optparams{0,1} Complex ns:sentencesplitterparams

3 Payload message for a sentence splitter service. restriction of ns:params hierarchy ns:params ns:sentencesplitterparams Complex ns:sentencesplittermainparams Main Params in the payload message for a Sentence Splitter service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:text Anything written. Text can be passed as a or as a URI. Typically, Text is used for input parameters such as 'text to be analyzed' and output parameters such as 'analyzed text'. ns: ns:file ns:file, ns: Complex ns:tokenizerparams Payload message for a tokenizer service. 3

4 restriction of ns:params hierarchy ns:params ns:tokenizerparams Complex ns:tokenizermainparams Main Params in the payload message for a Tokenizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:lemmatizerparams Payload message for a lemmatizer service. restriction of ns:params hierarchy ns:params ns:lemmatizerparams Complex ns:lemmatizermainparams 4

5 Main Params in the payload message for a Lemmatizer service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:postaggerparams Payload message for a PoS tagger service. restriction of ns:params hierarchy ns:params ns:postaggerparams Complex ns:postaggermainparams Main Params in the payload message for a PoS Tagger service. Includes language and text. ns:language, ns:input ns:input, ns:language Complex ns:shallowparserparams Payload message for a Shallow Parser (or Chunking) service. 5

6 restriction of ns:params hierarchy ns:params ns:shallowparserparams Complex ns:shallowparsermainparams Main Params in the payload message for a Shallow Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:parserparams Payload message for a Parser service. restriction of ns:params hierarchy ns:params ns:parserparams Complex ns:parsermainparams 6

7 Main Params in the payload message for a Parser (or Chunking) service. Includes language and text to be parsed. ns:language, ns:input ns:input, ns:language Complex ns:textalignerparams Payload message for a Text Alignment service. restriction of ns:params hierarchy ns:params ns:textalignerparams Complex ns:textalignermainparams Main Params in the payload message for a Text Aligner service. Includes language and corpus for both source and target. ns:source_language, ns:source_corpus, ns:target_language, ns:target_corpus ns:source_corpus, ns:source_language, ns:target_corpus, ns:target_language Complex ns:cqpindexerparams Payload message for a CWB service when indexing a corpus. 7

8 restriction of ns:params hierarchy ns:params ns:cqpindexerparams Complex ns:cqpindexermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes the CQP corpus to be index and the CQP structure of the corpsus. ns:cqpcorpus, ns:cqpcorpusstructure ns:cqpcorpus, ns:cqpcorpusstructure Complex ns:cqpcorpus A CQP corpus (following the standard verticalized CWB input format) restriction of ns:text hierarchy ns:text ns:cqpcorpus ns: ns:file ns:file, ns: 8

9 Complex ns:cqpquerierparams Payload message for a CWB service when querying a corpus. restriction of ns:params hierarchy ns:params ns:cqpquerierparams Complex ns:cqpqueriermainparams Main Params in the payload message for a CWB service when indexing a corpus. Includes a reference to the indexed corpus to be queried and the CQP query. ns:cqpquery, ns:corpusreference ns:cqpquery, ns:corpusreference Complex ns:concordancerparams Payload message for a concordancer (or KWIC) service. restriction of ns:params hierarchy ns:params ns:concordancerparams 9

10 Complex ns:concordancermainparams Main Params in the payload message for a Concordancer service. Includes the search query and the window size. ns:searchquery, ns:windowsize{0,1} ns:searchquery, ns:windowsize Complex ns:concordanceroptionalparams Optional parameters in the payload message for a Concordancer (or KWIC) service. ALL(ns:OrderBy{0,1}) ns:orderby Complex ns:ngramscooccurrencesparams Payload message for a n-gram (or collocation finder) service. restriction of ns:params hierarchy ns:params ns:ngramscooccurrencesparams Complex ns:ngramscooccurrencesmainparams Main Params in the payload message for a n-gram (or collocation finder) service. Includes the corpus to be analyzed and the Number of tokens in the n-grams. 10

11 ns:corpus, ns:n ns:corpus, ns:n Complex ns:ngramscooccurrencesoptionalparams Optional parameters in the payload message for a n-gram (or collocation finder) service. ALL(ns:WindowSize{0,1} ns:frequencythreshold{0,1} ns:rankthreshold{0,1} ns:scorethreshold{0,1} ns:associationmeassure{0,1}) ns:associationmeassure, ns:frequencythreshold, ns:rankthreshold, ns:scorethreshold, ns:windowsize Complex ns:crawlerparams Payload message for a crawler service. restriction of ns:params hierarchy ns:params ns:crawlerparams Complex ns:crawlermainparams Main Params in the payload message for crawler service. 11

12 ns:language+, ns:domain, ns:url+, ns:termlist ns:domain, ns:language, ns:termlist, ns:url Complex ns:crawleroptionalparams Optional parameters in the payload message for a crawler service. It includes: maxsize and maxtime as criteria to stop the crawling process and timeout and retries for flow control purposes ALL(ns:maxSize{0,1} ns:maxtime{0,1} ns:timeout{0,1} ns:retries{0,1}) ns:maxsize, ns:maxtime, ns:retries, ns:timeout Complex ns:maxsize Maximum amount od data. It is expressed in terms of size (amount in numbers) and sizeunit (unit) ns:size, ns:sizeunit ns:size, ns:sizeunit Complex ns:nerecognitionparams Payload message for a Named Entity recognition service. 12

13 restriction of ns:params hierarchy ns:params ns:nerecognitionparams Complex ns:nerecognitionmainparams Main Params in the payload message for a Named Entity recognition service. ns:language, ns:input ns:input, ns:language Complex ns:nerecognitionoptionalparams Optional parameters in the payload message for a Named Entity recognition service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:nes{0,1}) ns:nes, ns:inputformat, ns:outputformat Complex ns:termextractionparams Payload message for a term extraction service. 13

14 restriction of ns:params hierarchy ns:params ns:termextractionparams Complex ns:termextractionmainparams Main Params in the payload message for a term extraction service. ns:language, ns:input ns:input, ns:language Complex ns:termextractionoptionalparams Optional parameters in the payload message for a term extraction service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1} ns:frequencythreshold{0,1} ns:stoplist{0,1}) ns:frequencythreshold, ns:inputformat, ns:outputformat, ns:stoplist Complex ns:topicidentifierparams Payload message for a term extraction service. 14

15 restriction of ns:params hierarchy ns:params ns:topicidentifierparams Complex ns:topicidentifiermainparams Main Params in the payload message for a topic identifier service. ns:language, ns:input ns:input, ns:language Complex ns:topicidentifieroptionalparams Optional parameters in the payload message for a topic identifier service. ALL(ns:inputFormat{0,1} ns:outputformat{0,1}) ns:inputformat, ns:outputformat Complex ns:ditionarylookupparams Payload message for a dictionary lookup service. 15

16 restriction of ns:params hierarchy ns:params ns:ditionarylookupparams Complex ns:dictionarylookupmainparams Main Params in the payload message for a dictionary lookup service. ns:language, ns:wordform ns:language, ns:wordform Complex ns:response Complex ns:sentencesplitterresponse 16

17 extension of ns:response hierarchy ns:response ns:sentencesplitterresponse ns:sentencesplittedtext ns:sentencesplittedtext Complex ns:tokenizerresponse extension of ns:response hierarchy ns:response ns:tokenizerresponse ns:tokenizedtext ns:tokenizedtext Complex ns:lemmatizerresponse 17

18 extension of ns:response hierarchy ns:response ns:lemmatizedtext ns:lemmatizedtext ns:lemmatizerresponse Complex ns:postaggerresponse extension of ns:response hierarchy ns:response ns:postaggerresponse ns:posannotatedtext ns:posannotatedtext Complex ns:shallowparserresponse extension of ns:response hierarchy ns:response ns:shallowparserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:parserresponse 18

19 extension of ns:response hierarchy ns:response ns:parserresponse ns:shallowparsedtext ns:shallowparsedtext Complex ns:textalignerresponse extension of ns:response hierarchy ns:response ns:textalignerresponse ns:alignedtext ns:alignedtext Complex ns:cqpindexerresponse 19

20 extension of ns:response hierarchy ns:response ns:reference ns:reference ns:cqpindexerresponse Complex ns:cqpquerierresponse extension of ns:response hierarchy ns:response ns:cqpquerierresponse ns:concordances ns:concordances Complex ns:concordancerresponse extension of ns:response hierarchy ns:response ns:concordancerresponse ns:concordances ns:concordances Complex ns:ngramscooccurrencesresponse 20

21 extension of ns:response hierarchy ns:response ns:ngramscooccurrencesresponse ns:ngramscoocurrences ns:ngramscoocurrences Complex ns:crawlerresponse extension of ns:response hierarchy ns:response ns:crawlerresponse ns:webcorpus ns:webcorpus Complex ns:nerecognitionresponse 21

22 extension of ns:response hierarchy ns:response ns:neannotatedtext ns:neannotatedtext ns:nerecognitionresponse Complex ns:termextractionresponse extension of ns:response hierarchy ns:response ns:termextractionresponse ns:termannotatedtext ns:termannotatedtext Complex ns:topicidentifierresponse extension of ns:response hierarchy ns:response ns:topicidentifierresponse ns:topicannotatedtext ns:topicannotatedtext Complex ns:dictionarylookupresponse 22

23 extension of ns:response hierarchy ns:response ns:dictionarylookupresponse ns:lexicalentrieslist ns:lexicalentrieslist Complex ns:lexicalentrieslist ns:lexicalentry+ ns:lexicalentry Complex ns:lexicalentry Expressed in terms of lemma and PoS tag ns:lemma, ns:postag ns:postag, ns:lemma Complex ns:sourcetext The source text (for example in an alignment task) 23

24 restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:sourcetext Complex ns:targettext The target text of a service that requires a source and a target input files (for example in an alignment task) restriction of ns:text hierarchy ns:text ns: ns:file ns:file, ns: ns:targettext Simple s Simple ns:lang Language iso-639 restriction of Simple ns:cqpcorpusstructure The structure used to to encode the verticalized text to CWB binary format with the cwb-encode tool. (ex. "-P pos -P lemma -S s") Simple ns:cqpquery A CQP query. 24

25 Simple ns:reference The reference that a WS returns which can be used in future events. Simple ns:searchquery A search query. Simple ns:windowsize Size of window, typically for concordancers and collocation analyses int Simple ns:orderby The sorting criteria to be used when dispalying results. (Possible more values should be added) restriction of Simple ns:number int Simple ns:threshold The level or point at which something would happen, would cease to happen, or would take effect, become true, etc. int 25

26 Simple ns:associationmeassure The measure of the link between two variables. restriction of Simple ns:domain Simple ns:termlist "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. list of ns:term Simple ns:size "The size of the resource with regard to the SizeUnit measurement in form of a number." Simple ns:sizeunit "Specification of the unit of size that is used when specifying the size." Simple ns:maxtime Maximun duration. Simple ns:timeout 26

27 Simple ns:retries Simple ns:inputformat The format required for a given Input parameter ns:format hierarchy ns:format ns:inputformat Simple ns:format The format of a given I/O parameter (ex: xml, html, verticalised,...) Simple ns:outputformat The format of a given Output parameter ns:format hierarchy ns:format ns:outputformat Simple ns:nelist List of Named Entities ( list of ns:ne 27

28 Simple ns:wordform Simple ns:lemma : The base form of a word or term that is used as the formal dictionary entry for the term. Simple ns:postag A category assigned to a word based on its grammatical and semantic properties. Simple ns:tagset "Specifies the tag set used in the annotation of the resource or a used by the tool/service or it contains a URL that points to the information about the tag set" Simple ns:term "A verbal designation of a general concept in a specific subject field". Typically, terms are used by web crawlers services as seed terms. Simple ns:ne "segment of text for which one or many rigid designators stands for the referent. usually named entities are located and classified into predefined types such as names of person, organizations, locations, expressions of times etc.". 28

29 29

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Chapter 8. Final Results on Dutch Senseval-2 Test Data Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

DK-CLARIN WP 2.1 Technical Report Jørg Asmussen, DSL, with input from other WP 2 members Final version of May 5, 2015 1

DK-CLARIN WP 2.1 Technical Report Jørg Asmussen, DSL, with input from other WP 2 members Final version of May 5, 2015 1 Text formatting What an annotated text should look like DK-CLARIN WP 2.1 Technical Report Jørg Asmussen, DSL, with input from other WP 2 members Final version of May 5, 2015 1 Deliverables concerned D2

More information

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1 Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

FoLiA: Format for Linguistic Annotation

FoLiA: Format for Linguistic Annotation Maarten van Gompel Radboud University Nijmegen 20-01-2012 Introduction Introduction What is FoLiA? Generalised XML-based format for a wide variety of linguistic annotation Characteristics Generalised paradigm

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

Dutch Parallel Corpus

Dutch Parallel Corpus Dutch Parallel Corpus Lieve Macken lieve.macken@hogent.be LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF

Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Die Vielfalt vereinen: Die CLARIN-Eingangsformate CMDI und TCF Susanne Haaf & Bryan Jurish Deutsches Textarchiv 1. The Metadata Format CMDI Metadata? Metadata Format? and more Metadata? Metadata Format?

More information

WebLicht: Web-based LRT services for German

WebLicht: Web-based LRT services for German WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen firstname.lastname@uni-tuebingen.de Abstract This software

More information

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE) HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India anuangra@yahoo.com http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation

A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation A Multi-document Summarization System for Sociology Dissertation Abstracts: Design, Implementation and Evaluation Shiyan Ou, Christopher S.G. Khoo, and Dion H. Goh Division of Information Studies School

More information

Off-line (and On-line) Text Analysis for Computational Lexicography

Off-line (and On-line) Text Analysis for Computational Lexicography Offline (and Online) Text Analysis for Computational Lexicography Von der PhilosophischHistorischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.)

More information

Automated Extraction of Security Policies from Natural-Language Software Documents

Automated Extraction of Security Policies from Natural-Language Software Documents Automated Extraction of Security Policies from Natural-Language Software Documents Xusheng Xiao 1 Amit Paradkar 2 Suresh Thummalapenta 3 Tao Xie 1 1 Dept. of Computer Science, North Carolina State University,

More information

Using the BNC to create and develop educational materials and a website for learners of English

Using the BNC to create and develop educational materials and a website for learners of English Using the BNC to create and develop educational materials and a website for learners of English Danny Minn a, Hiroshi Sano b, Marie Ino b and Takahiro Nakamura c a Kitakyushu University b Tokyo University

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning

GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning PACLIC 24 Proceedings 357 GRASP: Grammar- and Syntax-based Pattern-Finder for Collocation and Phrase Learning Mei-hua Chen a, Chung-chi Huang a, Shih-ting Huang b, and Jason S. Chang b a Institute of Information

More information

HL7 and DICOM based integration of radiology departments with healthcare enterprise information systems

HL7 and DICOM based integration of radiology departments with healthcare enterprise information systems international journal of medical informatics 76S (2007) S425 S432 journal homepage: www.intl.elsevierhealth.com/journals/ijmi HL7 and DICOM based integration of radiology departments with healthcare enterprise

More information

Central and South-East European Resources in META-SHARE

Central and South-East European Resources in META-SHARE Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB

More information

Annotated Corpora in the Cloud: Free Storage and Free Delivery

Annotated Corpora in the Cloud: Free Storage and Free Delivery Annotated Corpora in the Cloud: Free Storage and Free Delivery Graham Wilcock University of Helsinki graham.wilcock@helsinki.fi Abstract The paper describes a technical strategy for implementing natural

More information

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship

The PALAVRAS parser and its Linguateca applications - a mutually productive relationship The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History

More information

Beginning Oracle. Application Express 4. Doug Gault. Timothy St. Hilaire. Karen Cannell. Martin D'Souza. Patrick Cimolini

Beginning Oracle. Application Express 4. Doug Gault. Timothy St. Hilaire. Karen Cannell. Martin D'Souza. Patrick Cimolini Beginning Oracle Application Express 4 Doug Gault Karen Cannell Patrick Cimolini Martin D'Souza Timothy St. Hilaire Contents at a Glance About the Authors Acknowledgments iv xv xvil 0 Chapter 1: An Introduction

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services

Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects

More information

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1

Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1 Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System 1 Maria Teresa Pazienza, Armando Stellato and Michele Vindigni Department of Computer Science, Systems and Management,

More information

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits

SharePoint Server 2010 Capacity Management: Software Boundaries and Limits SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

A Cross-Lingual Pattern Retrieval Framework

A Cross-Lingual Pattern Retrieval Framework A Cross-Lingual Pattern Retrieval Framework Mei-Hua Chen, Chung-Chi Huang, Shih-Ting Huang, Hsien-Chin Liou, and Jason S. Chang Abstract We introduce a method for learning to grammatically categorize and

More information

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata

Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Standard for Information and Image Management Standard Recommended Practice extensible Markup Language (XML) for the Interchange of Document Images and Related Metadata Association for Information and

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174

Index. AdWords, 182 AJAX Cart, 129 Attribution, 174 Index A AdWords, 182 AJAX Cart, 129 Attribution, 174 B BigQuery, Big Data Analysis create reports, 238 GA-BigQuery integration, 238 GA data, 241 hierarchy structure, 238 query language (see also Data selection,

More information

Introduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange

Introduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange Introduction to the SIF 3.0 Infrastructure: An Environment for Educational Data Exchange SIF 3.0 Infrastructure Goals of the Release Environment types & types Registration & Security Basic Architectural

More information

EvilSeed: A Guided Approach to Finding Malicious Web Pages

EvilSeed: A Guided Approach to Finding Malicious Web Pages EvilSeed: A Guided Approach to Finding Malicious Web Pages L. Invernizzi 1 S. Benvenuti 2 M. Cova 3,5 P. Milani Comparetti 4,5 C. Kruegel 1 G. Vigna 1 1 UC Santa Barbara 2 University of Genova 3 University

More information

A comprehensive guide to XML Sitemaps:

A comprehensive guide to XML Sitemaps: s emperpl ugi ns. com A comprehensive guide to XML Sitemaps: What are they? Why do I need one? And how do I create one? A little background and history A sitemap is a way of collecting and displaying the

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features , pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of

More information

Sitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/sitemap.

Sitemap. Component for Joomla! This manual documents version 3.15.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/sitemap. Sitemap Component for Joomla! This manual documents version 3.15.x of the Joomla! extension. http://www.aimy-extensions.com/joomla/sitemap.html Contents 1 Introduction 3 2 Sitemap Features 3 3 Technical

More information

Ask your Database: Natural Language Processing using In-Memory Technology

Ask your Database: Natural Language Processing using In-Memory Technology Enterprise Platform and Integration Concepts Master Project Summer Term 2015 Ask your Database: Natural Language Processing using In-Memory Technology Dr. Mariana Neves April 10th, 2015 Question Answering

More information

Abstract 1. INTRODUCTION

Abstract 1. INTRODUCTION A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es

More information

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING

More information

Online Multilingual Translation of Technical Service Reports over the World Wide Web

Online Multilingual Translation of Technical Service Reports over the World Wide Web Online Multilingual Translation of Technical Service Reports over the World Wide Web S. Liu, S.C. Hui, S. Foo and P.C. Leong School of Applied Science, Nanyang Technological University Nanyang Avenue,

More information

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Timeline (1) Text Mining 2004-2005 Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining? Text Mining 2004-2005 Master TKI Antal van den Bosch en Walter Daelemans http://ilk.uvt.nl/~antalb/textmining/ Dinsdag, 10.45-12.30, SZ33 Timeline (1) [1 februari 2005] Introductie (WD) [15 februari 2005]

More information

An Online Service for SUbtitling by MAchine Translation

An Online Service for SUbtitling by MAchine Translation SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam

More information

Introduction to Text Mining. Module 2: Information Extraction in GATE

Introduction to Text Mining. Module 2: Information Extraction in GATE Introduction to Text Mining Module 2: Information Extraction in GATE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence

More information

How to make Ontologies self-building from Wiki-Texts

How to make Ontologies self-building from Wiki-Texts How to make Ontologies self-building from Wiki-Texts Bastian HAARMANN, Frederike GOTTSMANN, and Ulrich SCHADE Fraunhofer Institute for Communication, Information Processing & Ergonomics Neuenahrer Str.

More information

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it;

JAVA r VOLUME II-ADVANCED FEATURES. e^i v it; ..ui. : ' :>' JAVA r VOLUME II-ADVANCED FEATURES EIGHTH EDITION 'r.", -*U'.- I' -J L."'.!'.;._ ii-.ni CAY S. HORSTMANN GARY CORNELL It.. 1 rlli!>*-

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

vcloud Air Platform Programmer's Guide

vcloud Air Platform Programmer's Guide vcloud Air Platform Programmer's Guide vcloud Air OnDemand 5.7 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition.

More information

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS Distr. GENERAL Working Paper No.2 26 April 2007 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL

More information

Computer-aided Document Indexing System

Computer-aided Document Indexing System Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

DataDirect XQuery Technical Overview

DataDirect XQuery Technical Overview DataDirect XQuery Technical Overview Table of Contents 1. Feature Overview... 2 2. Relational Database Support... 3 3. Performance and Scalability for Relational Data... 3 4. XML Input and Output... 4

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

HireDesk API V1.0 Developer s Guide

HireDesk API V1.0 Developer s Guide HireDesk API V1.0 Developer s Guide Revision 1.4 Talent Technology Corporation Page 1 Audience This document is intended for anyone who wants to understand, and use the Hiredesk API. If you just want to

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes

Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes Ingrid Andås Berg Healthcare Informatics Submission date: March 2014 Supervisor: Øystein Nytrø, IDI

More information

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language

EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language EXMARaLDA and the FOLK tools two toolsets for transcribing and annotating spoken language Thomas Schmidt Institut für Deutsche Sprache, Mannheim R 5, 6-13 D-68161 Mannheim thomas.schmidt@uni-hamburg.de

More information

Trameur: A Framework for Annotated Text Corpora Exploration

Trameur: A Framework for Annotated Text Corpora Exploration Trameur: A Framework for Annotated Text Corpora Exploration Serge Fleury (Sorbonne Nouvelle Paris 3) serge.fleury@univ-paris3.fr Maria Zimina(Paris Diderot Sorbonne Paris Cité) maria.zimina@eila.univ-paris-diderot.fr

More information

Observation Metadata and its Use in the DWD Weather Data Request Broker

Observation Metadata and its Use in the DWD Weather Data Request Broker Observation Metadata and its Use in the DWD Weather Data Request Broker Jürgen Seib Deutscher Wetterdienst e-mail: juergen.seib@dwd.de What kind of metadata is needed for the discovery of observation data?

More information

Configuring DNS. Finding Feature Information

Configuring DNS. Finding Feature Information The Domain Name System (DNS) is a distributed database in which you can map hostnames to IP addresses through the DNS protocol from a DNS server. Each unique IP address can have an associated hostname.

More information

DB2 Web Query Interfaces

DB2 Web Query Interfaces DB2 Web Query Interfaces There are several different access methods within DB2 Web Query and their related products. Here is a brief summary of the various interface and access methods. Method: DB2 Web

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu

Semistructured data and XML. Institutt for Informatikk INF3100 09.04.2013 Ahmet Soylu Semistructured data and XML Institutt for Informatikk 1 Unstructured, Structured and Semistructured data Unstructured data e.g., text documents Structured data: data with a rigid and fixed data format

More information

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Interoperability, Standards and Open Advancement

Interoperability, Standards and Open Advancement Interoperability, Standards and Open Eric Nyberg 1 Open Shared resources & annotation schemas Shared component APIs Shared datasets (corpora, test sets) Shared software (open source) Shared configurations

More information

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC

More information

Introduction to XML Applications

Introduction to XML Applications EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for

More information

D-SPIN. Volker Boehlke, Gerhard Heyer University of Leipzig. - overview, prototype, roadmap - Institut für Informatik

D-SPIN. Volker Boehlke, Gerhard Heyer University of Leipzig. - overview, prototype, roadmap - Institut für Informatik D-SPIN - overview, prototype, roadmap - Volker Boehlke, Gerhard Heyer University of Leipzig Institut für Informatik LeipzigLinguisticServices - repository: - relational database - used fields are: ID,

More information

EFFECTIVE STORAGE OF XBRL DOCUMENTS

EFFECTIVE STORAGE OF XBRL DOCUMENTS EFFECTIVE STORAGE OF XBRL DOCUMENTS An Oracle & UBmatrix Whitepaper June 2007 Page 1 Introduction Today s business world requires the ability to report, validate, and analyze business information efficiently,

More information