Declarative Parsing and Annotation of Electronic Dictionaries
|
|
|
- Jean Rose
- 10 years ago
- Views:
Transcription
1 Declarative Parsing and Annotation of Electronic Dictionaries Christian Schneiker 1, Dietmar Seipel 1, Werner Wegstein 2, and Klaus Prätor 3 1 Department of Computer Science {schneiker seipel}@informatik.uni-wuerzburg.de 2 Competence Center for Computational Philology [email protected] University of Würzburg, Am Hubland, D Würzburg, Germany 3 Berlin Brandenburg Academy of Sciences and the Humanities Jägerstr , D Berlin, Germany [email protected] Abstract. We present a declarative annotation toolkit based on XML and PRO- LOG technologies, and we apply it for annotating the Campe Dictionary to obtain an electronic version in XML (TEI). For parsing flat structures, we use a very compact grammar formalism called extended definite clause grammars (EDCG s), which is an extended version of the DCG s that are well known from the logic programming language PROLOG. For accessing and transforming XML structures, we use the XML query and transformation language FNQUERY. It turned out, that the declarative approach in PROLOG is much more readable, reliable, flexible, and faster than an alternative implementation which we had made in JAVA and XSLT for the TEXTGRID community project. 1 Introduction Dictionaries traditionally offer information on words and their meaning in highly structured and condensed form, making use of a wide variety of abbreviations and an elaborate typography. Retro digitizing printed dictionaries, especially German dictionaries from the 18th and early 19th century, therefore requires sophisticated new tools and encoding techniques in order to convert typographical detail into a fine grain markup of dictionary structures on one hand and to allow at the same time for variation in orthography, morphology and usage on the other hand, since in the decades around 1800 the German language was still on its way to standardization. This is one of the reasons why TEXTGRID [17], the first grid project in German ehumanities, funded by the Federal Ministry of Education and Research, chose the Campe Dictionary [1]: 6 volumes with altogether about pages and about entries, published between 1807 and 1813 as one testbed for their TEXTGRID Lab, a Virtual Research Library. It entails a grid enabled workbench, that will process, analyse, annotate, edit and publish text data for academic research, and TEXTGRIDRep,
2 a grid repository for long term storage. This paper reflects some of the research on annotation in this context. Using PROLOG technology for parsing and annotating is common in natural language processing. PROLOG has been used within the Jean Paul project at the Berlin Brandenburg Academy of Sciences [15], where XML transformations based on FN- QUERY turned out to be easier to write than XSLT transformations. The task is to find the proper level of abstraction and write suitable macros for frequently occurring patterns in the code; PROLOG even allows to design dedicated special purpose languages [11]. E.g., definite clause grammars have been developed as an abstraction for parsing; they have been used for parsing controlled natural languages [4, 5, 13]. Since PROLOG code is declarative and very compact, the variations of natural language can be handled nicely. Figure 1 shows the typographical layout of the Campe Dictionary. The basic text has been double keyed in China. The encoding is based on the TEI P5 Guidelines [16], using a Relax NG Schema. The encoding structure uses elements to markup the dictionary entry, the form block with inflectional and morphological information, the sense block handling semantic description and references, quotations, related entries, usage as well as notes. In the future, this encoding will help us to structure the digital world according to semantic criteria and thus provide an essential basis for constructing reliable ontologies. Figure 1. Excerpt from the Campe Dictionary The rest of this paper is organized as follows: In Section 2 we will show how to extract entries of a dictionary using our XML query and transformation language FN- QUERY. The process for parsing and annotating elements for creating a well formed and valid TEI structure is illustrated for the Campe Dictionary in Section 3: we can access and create elements with PROLOG using DCG s and EDCG s in an easy and efficient way. Section 4 describes the use of transformation rules for the annotation and the further processing of XML elements. Section 5 gives an overview of our declarative annotation toolkit and compares the declarative techniques with a JAVA and XSLT approach, which we had implemented earlier. 2
3 2 Basic XML Handling in PROLOG and FNQUERY Currently, the Campe Dictionary, which was written in the early years of the 19th century, is converted into a machine readable structure. Within this process, the only annotations available so far, were the declaration of the different font sizes Joachim Heinrich Campe uses for displaying the structure of his act, the numbering of the line and page breaks in the dictionary, and paragraphs; thus, we found a very limited XML structure in the source file which we used for the first basic transformaions. In this paper, we present how to annotate documents with PROLOG and the XML query and transformation language FNQUERY [14], which is implemented in SWI PROLOG. To exemplify these annotations, we use the Campe Dictionary as a low structured base for obtaining a well formed XML document according to TEI. The PROLOG Data Structure for XML. The field notation developed for FNQUERY represents an XML element <T a 1 = v 1...a n = v n >... </T> as a PROLOG term T:As:C, called FN triple, with the tag T and an association list As =[a 1 : v 1,...,a n : v n ] of attributes a i and their corresponding values v i (with 1 i n), which are PROLOG terms. The content C can be either text or nested sub elements represented as FN triples. If As is empty, then the FN triple can be abbreviated as a pair T:C. In most available dictionaries, each entry is encapsulated in its own paragraph, and thus, it could be easily detected. In the following example, an entry is annotated with paragraph and is followed by an element W_2, which shows the lemma of the entry in a larger font; recognizing both elements is necessary, because there could exist other paragraph elements, which do not represent entries. <paragraph> <W_2>Der Aal</W_2>, <W_1>des -- es, Mz. die -- e</w_1>,... </paragraph> An XML document can be loaded into an FN triple using the predicate dread. For the paragraph element above we get FN triples with empty attribute lists: paragraph:[ W_2 :[ Der Aal ],,, W_1 :[ des -- es, Mz. die -- e ],,... ] Extraction of Entries Using FNQUERY. The query language FNQUERY allows for accessing a component of an XML document by its attribute or tag name. Furthermore, complex path or tree expressions can be formulated in a way quite similar to XPATH. The XML element from above could now be parsed with the following predicate campe_find_entry/2. The path expression Campe/descendant::paragraph selects a descendant element of Campe with the tag paragraph. For avoiding the recognition of a new paragraph without a following W_2 tag, we use another path expression Entry/nth_child::1/tag:: * for computing the tag of the first child of the considered entry. campe_find_entry(campe, Entry) :- Entry := Campe/descendant::paragraph, W_2 := Entry/nth_child::1/tag:: *. 3
4 Finally, with PROLOG s backtracking mechanism it is possible to find all entries in the source file. 3 Annotation with Extended Grammar Rules In the past, PROLOG has been frequently used for implementing natural language applications, in which parsing text is one of the main concerns. Most PROLOG systems include a preprocessor for defining grammar rules for parsing text. These definite clause grammars hide arguments which are not relevant for the semantics of the parsing problems; thus, they are more readable and reliable than standard PROLOG rules. In this section, we want to discuss this benefit for parsing electronic dictionaries. We give an example for parsing a lemma of an entry to generate TEI elements. In a further step, we introduce an extended version of DCG s (EDCG s), which give the user the ability to create compact grammar rules for generating generic FN terms for the parsed tokens. A comparison to DCG s and standard PROLOG rules will be elaborated in addition to the possibility of integrating EDCG s in the other formalisms. 3.1 Parsing with Definite Clause Grammars Firstly, we will use DCG s for parsing the headwords of a single entry and for detecting punctuation in natural text. Using grammar rules is more reliable than using standard PROLOG rules, since the code becomes much more compact and readable. Moreover, we introduce the sequence predicate for parsing a list of XML elements. Headwords. An entry of a dictionary normally consists of a lemma which is highlighted with a larger font; in our case, it is annotated with W_2 tags. Often, such a lemma only consists of one word in the case of verbs or a noun and its determiner: The DCG predicate campe_headword given below parses a headword XML element <W_2>Der Aal</W_2> and annotates it to derive the following form element: <form> <form type="lemma"> <form type="determiner"> <orth>der</orth> </form> <form type="headword"> <orth>aal</orth> </form> </form> </form> There also exist some cases with more than one headword or additional XML tags depending on the current stage of the process, such as line breaks or abbreviations, e.g., the following collective reference to related entries: <W_2>Der Blitzstoffmesser, der Blitzstoffsammler, <lb n=" " /> der Blitzstoffsauger</W_2> The additional elements have to be passed through and should not be annotated; the different headwords have to be annotated, and a form element for each lemma has to be created. The following DCG rule for campe_headword can handle the described variations; it parses the different types of W_2 elements to create FN triples X: 4
5 campe_headword(x) --> ( [X], { X = T:As:Es } ; [X], { atomic(x), campe_is_unicode_char(x) } ; [A, B], { atomic(a), atomic(b), campe_is_determiner(a), X = form:[type:lemma]:[ form:[type:determiner]:[orth:[a]], form:[type:headword]:[orth:[b]] ] } ; [A], { atomic(a), X = form:[type:lemma]:[ form:[type:headword]:[orth:[a] ] ] } ). With standard DCG technology, this predicate has to be called recursively for parsing a (possibly empty) sequence of headwords. This is done by the recursive predicate campe_headwords, which terminates when no more headwords are found: campe_headwords([x Xs]) --> campe_headword(x), campe_headwords(xs). campe_headwords([]) --> { true }. To simplify this, we have developed the meta-predicate sequence; like in regular expressions, the first argument * indicates that we look for an arbitrary number of headwords (other possible values are, e.g., + or? ): campe_headwords(xs) --> sequence( *, campe_headword, Xs). We can apply campe_headwords to produce a sequence of headwords, which are afterwards enclosed in a form tag and written to the screen:?- campe_headwords(xs, [ Der, Aal ], []), dwrite(xml, form:xs). Punctuation. For annotating punctuation in a lemma, which can appear between single headwords, the DCG predicate campe_punctuation is used for checking each token if it is a punctuation mark, and if so annotating it with a c tag. campe_punctuations(xs) --> sequence( *, campe_punctuation, Xs). campe_punctuation(x) --> ( [A], { is_punctuation(a), X = c:[a] } ; [X] ). The meta-predicate sequence used in the DCG predicate campe_punctuations parses a list of elements. 3.2 Parsing with Extended Definite Clause Grammars In the following, we show how nouns can be parsed using EDCG s. For complex applications, standard DCG s can have a complex structure, and understanding and debugging them can be tedious. Thus, we have developed a new, more compact syntax for writing DCG rules in PROLOG, which we call Extended DCG s (EDCG). For the representation of XML elements created by EDCG s we use a generic field notation. 5
6 Assume that a paragraph of a dictionary has to be parsed and annotated for labeling the inflected forms of a noun. In the Campe Dictionary, lemma variations (such as plural or genitive forms) are found in nearly all of the regular substantives; thus, an easy to read structure has to be developed to give the programmer the potential of writing complex parsers in a user friendly way. The following line shows such an extract of the Campe Dictionary. des -- es, Mz. die -- e These tokens have to be annotated in TEI, and different form tags with sub elements defining the corresponding grammatical structure have to be created. The following XML code, which should be produced, shows the complexity of such annotations, which indicates that the corresponding PROLOG code will also be complex: <form type="inflected"> <gramgrp> <gram type="number"> <abbr>mz.</abbr> </gram> <case value="nominative"/> <number value="plural"/> </gramgrp> <form type="determiner"> <orth>die</orth> </form> <form type="headword"> <orth> <ovar> <oref>-- e</oref> </ovar> </orth> </form> </form> Extended Definite Clause Grammars. For solving a parsing problem using regular PROLOG rules, the input tokens as well as the field notation for the created XML have to be processed. Thus, we have developed a new notation for parsing language where the output is regular XML. Instead of using the functor --> of DCG s, we are using the new functor ==> for writing EDCG s. The output arguments can be hidden, since the output is constructed in a generic way: the output of an EDCG rule with the head T is a list [T:Xs] containing exactly one FN triple, where Xs is the list of FN triples produced by the body of the rule. The following EDCG rules parse inflected forms like indicated above into FN triples: form ==> grammar_determiner, form_headword. grammar_determiner ==> ( gram,!, determiner ; determiner ). gram ==> [ Mz. ]. determiner ==> [X], { campe_is_determiner(x) }. form_headword ==> orth. orth ==> [ --, _]. Below, we call the predicate form for parsing a list of tokens (the second argument) into a list Xs (the first argument) of form elements; the last argument contains the tokens that could not be parsed i.e., it should be empty:?- form(xs, [ Mz., die, --e ], []), dwrite(xml, form:xs). 6
7 The output of the predicate form is enclosed in a further form tag and written to the screen: <form> <grammar_determiner> <gram>mz.</gram> <determiner>die</determiner> </grammar_determiner> <form_headword> <orth>-- e</orth> </form_headword> </form> From this, the exact, desired XML structure can be derived using some simple transformations, which we will describe in Section 4. The code is much better understandable than for DCG s, because this notation suppresses irrelevant arguments. If we define grammar_determiner with the following DCG s and mix them with the EDCG s for the other predicates, then we can get even closer to the desired XML structure in one step at the expense of a less compact code: grammar_determiner([g, F]) --> gram([gram:[c]]),!, determiner([determiner:[d]]), { G = gramgrp:[ gram:[type:number]:[abbr:[c]], case:[value:nominative]:[], number:[value:plural]:[] ], F = form:[type:determiner]:[orth:[d]] }. grammar_determiner([g, F]) --> determiner([determiner:[d]]), { G = gramgrp:[ case:[value:genitive]:[], number:[value:singular]:[] ], F = form:[type:determiner]:[orth:[d]] }. Instead of the generic element grammar_determiner produced by the EDCG rule, the DCG rules can produce the two elements (G and F) of the desired XML structure. Now, we can derive the complete desired XML with very simple transformations. Finally, the different cases like genitive or dative and the plural forms of a dictionary entry could be parsed using similar DCG or EDCG rules. Comparison with DCG s and Standard PROLOG. In contrast, for our example the corresponding standard DCG rules of PROLOG (we show only half of them) are more complex than the EDCG rules: form([form:es]) --> grammar_determiner(xs), form_headword(ys), { append(xs, Ys, Es) }. gram([gram:[ Mz. ]]) --> [ Mz. ]. determiner([determiner:[x]]) --> [X], { campe_is_determiner(x) }. In many applications like the annotation of electronic dictionaries or other programs producing XML these DCG rules are quite complex and simplifying them is necessary. Finally, the implementation in pure, standard PROLOG would look even more complicated (again, we show only half of the rules): 7
8 form([form:es], As, Bs) :- grammar_determiner(xs, As, Cs), form_headword(ys, Cs, Bs), append(xs, Ys, Es). gram([gram:[ Mz. ], As, Bs) :- As = [ Mz. Bs]. determiner([determiner:[x]], As, Bs) :- campe_is_determiner(x), As = [X Bs]. Besides the output in the first argument of the DCG predicates here, which is constructed explicitely rather than generic, there are two more arguments for passing the list of input tokens. DCG s only use the first argument, and EDCG s hide all three arguments. Sequences of Form Elements. If we add the following DCG rule for form at the beginning of the DCG program and assume that commas have already been annotated as FN triples c:[, ], then we can annotate sequences of inflected form elements: form([x]) --> [X], { X = T:As:Es,! }. With the predicate sequence it is possible to parse a sequence Ts of tokens to inflected form elements, even when more than one genitive or plural form occurs. Since the output Fs is a list of lists of elements, we have to flatten it to an ordinary list Xs before we can write it to the screen:?- Ts = [des, --, es, c:[, ], Mz., die, --, e], sequence( *, form, Fs, Ts, []), flatten(fs, Xs), dwrite(xmls, Xs). These PROLOG rules are efficient and easly readable. The derived FN triples can be ouput in XML using the predicate dwrite/2. 4 Annotation with Transformation Rules In this section, we transform XML elements from the Campe Dictionary with FNQUERY in a way quite similar to XSLT, but with a more powerful backengine in PROLOG. A rule with the head --->(Predicate, T1, T2) transforms an FN triple T1 to another FN triple T2. Arbitrary PROLOG calls could be integrated within the rules; thus, FNQUERY is a Turing complete transformation language. FN transformation rules are called by the predicate fn_item_transform. The transformation is recursive; it starts in the leaves of the XML tree and ends in the root element. For example, the following rules transform all A elements in an FN triple to an hi element with an attribute rend="roman" and all W_1 elements to an hi element with an attribute rend="large" for labeling the font size. Other elements are left unchanged, because of rule 3: --->(antiqua, A :_:Es, hi:[rend:roman]:es). --->(large_font, W_1 :_:Es, hi:[rend:large]:es). --->(_, X, X). 8
9 Another example is the transformation of the XML elements created by the EDCG s of Section >(inflected, form:_:[t1, T2], form:[type:inflected]:[g, F1, F2]) :- D := T1/determiner/content:: *, O := T2/orth, ( C := T1/gram/content:: *, C = [ Mz. ] -> G =... ; G =... ), F1 = form:[type:determiner]:[orth:d], F2 = form:[type:headword]:[ovar:[orev:[o]]]. This rule transforms an FN triple analogously to the DCG rule for grammar_determiner defined in Section 3; an attribute type is added to each of the form elements. The element grammar_determiner is separated into two different elements (namely gramgrp including two additional elements case and number) and an optional gram element; all of them depend on the content of the case element. The orth sub element of the form element with attribute type="headword" is enclosed in two more elements for obtaining the desired structure. 5 The Declarative Annotation Toolkit The annotation techniques presented in the previous sections are part of an integrated declarative annotation toolkit. In this section, we sketch some further annotations which we have implemented in the field of electronic dictionaries. 5.1 Annotations of Electronic Dictionaries With FNQUERY, DCG s and EDCG s we have developed several applications for parsing natural text in electronic dictionaries such as Campe and Adelung. These techniques give us the possibility to identify and annotate lemmas, their inflected forms, as well as punctuations and hyphenations in parsed entries. Furthermore, it is possible to match slightly modified variants of a lemma in the text and to parse certain German relative clauses. Moreover, we have developed an application for annotating sub grouped senses in an entry labeled with list markers such as I, 1), 2), a., b), where PROLOG s backtracking mechanism is very useful for obtaining the proper structure. Figure 2 shows the rendering of an entry, which we had annotated with PROLOG before. 5.2 Comparison to JAVA and XSLT In an earlier step of the TEXTGRID project, we had designed and implemented a tool for parsing and annotating the Campe Dictionary in JAVA and XSLT. According to the guidelines of the TEXTGRIDLAB, this implementation was necessary for the community project. For Volume 1 of the Campe Dictionary, which consists of entries, the JAVA approach needs approximately 44 minutes for parsing and annotating all entries on a 9
10 Figure 2. Rendering of an Annotated Entry dual core CPU system using two threads. In our new approach in PROLOG, we can reduce this runtime to about 4 minutes while using only one core of the system. Since we are now using declarative technology, the code length could be reduced to only 5% of the JAVA implementation. 6 Conclusions In the BMBF research project on variations in language, we want to build a meta lemma list by analyzing a huge collection of dictionaries from different epochs of the German language. Both here and in the TEXTGRID community project, we need a fast, reliable, easy to read and modular toolkit for parsing, annotating and querying data sets. With the development of FNQUERY and EDCG s, we have the possibility to fulfil these requirements; our declarative annotation toolkit is even faster than modern applications written in JAVA and XSLT. The usability of the introduced technologies is not limited to the annotation and parsing of natural language; in another project we are using EDCG s and transformation rules for analyzing log messages or even data buses in order to find root causes in network systems. The aspect of integrating text mining to extend our toolkit will be a subject of future research. References 1. CAMPE, Joachim Heinrich: Wörterbuch der deutschen Sprache. 5 Volumes, Braunschweig, COVINGTON, M.A.: GULP 3.1: An Extension of Prolog for Unification Based Grammar. Research Report AI , Artificial Intelligence Center, University of Georgia,
11 3. DEREKO: The German Reference Corpus Project FUCHS, N.E.; FROMHERZ, M.P.J.: Transformational Development of Logic Programs from Executable Specifications Schema Based Visual and Textual Composition of Logic Programs. C. Beckstein, U. Geske (eds.), Entwicklung, Test und Wartung deklarativer KI Programme, GMD Studien Nr. 238, Gesellschaft für Informatik und Datenverarbeitung, FUCHS, N.E.; SCHWITTER, R.: Specifying Logic Programs in Controlled Natural Language. Proc. Workshop on Computational Logic for Natural Language Processing (CLNP) GAZDAR, G.; MELLISH, C. Natural Language Processing in Prolog. An Introduction to Computational Linguistics. Addison Wesley, HAUSMANN, F.J.; REICHMANN, O.; WIEGAND, H.E.; ZGUSTA, L.; eds.: Wörterbücher / Dictionaries / Dictionnaires Ein internationales Handbuch zur Lexikographie / An International Encyclopedia of Lexicography / Encyclopédie internationale de lexicographie. Berlin/New York, 1989 (I), 1990 (II) 8. HIRAKAWA, H.; ONO, K.; YOSHIMURA, Y.: Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text Corpora. Proc. 18th International Conference on Computational Linguistics (COLING) LANDAU, S.: Dictionaries. The Art and Craft of Lexicography. 2nd Edition, Cambridge, LLOYD, J.: Practical Advantages of Declarative Programming. CSLI Lecture Notes, Number 10, O KEEFE, R.A.: The Craft of Prolog. MIT Press, PEREIRA, F.C.N.; SHIEBER, S.M: Prolog and Natural Language Analysis. CSLI Lecture Notes, Number 10, SCHWITTER, R.: Working for Two: a Bidirectional Grammer for a Controlled Natural Language. Proc. 21st Australasian Joint Conference on Artificial Intelligence (AI) 2008, pp SEIPEL, D.: Processing XML Documents in Prolog. Proc. 17th Workshop on Logic Programmierung (WLP) SEIPEL, D.; PRÄTOR, K.: XML Transformations Based on Logic Programming. Proc. 18th Workshop on Logic Programming (WLP) 2005, pp TEI CONSORTIUM, eds.: TEI P5: Guidelines for Electronic Text Encoding and Interchange TEXTGRID: Modular platform for collaborative textual editing a community grid for the humanities
Special Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Knowledge Engineering for Business Rules in PROLOG
Knowledge Engineering for Business Rules in PROLOG Ludwig Ostermayer, Dietmar Seipel University of Würzburg, Department of Computer Science Am Hubland, D 97074 Würzburg, Germany {ludwig.ostermayer,dietmar.seipel}@uni-wuerzburg.de
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
JSquash: Source Code Analysis of Embedded Database Applications for Determining SQL Statements
JSquash: Source Code Analysis of Embedded Database Applications for Determining SQL Statements Dietmar Seipel 1, Andreas M. Boehm 1, and Markus Fröhlich 1 University of Würzburg, Department of Computer
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Firewall Builder Architecture Overview
Firewall Builder Architecture Overview Vadim Zaliva Vadim Kurland Abstract This document gives brief, high level overview of existing Firewall Builder architecture.
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
A Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1
Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services
Berlin-Brandenburg Academy of sciences and humanities (BBAW) resources / services speakers: Kai Zimmer and Jörg Didakowski Clarin Workshop WP2 February 2009 BBAW/DWDS The BBAW and its 40 longterm projects
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
Parsing Technology and its role in Legacy Modernization. A Metaware White Paper
Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Extensible Markup Language (XML): Essentials for Climatologists
Extensible Markup Language (XML): Essentials for Climatologists Alexander V. Besprozvannykh CCl OPAG 1 Implementation/Coordination Team The purpose of this material is to give basic knowledge about XML
Embedded Software Development with MPS
Embedded Software Development with MPS Markus Voelter independent/itemis The Limitations of C and Modeling Tools Embedded software is usually implemented in C. The language is relatively close to the hardware,
Introduction to Web Services
Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies
Internationalization Tag Set 1.0 A New Standard for Internationalization and Localization of XML
A New Standard for Internationalization and Localization of XML Felix Sasaki World Wide Web Consortium 1 San Jose, This presentation describes a new W3C Recommendation, the Internationalization Tag Set
Introduction to XML Applications
EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for
Rotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts
[Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational
Enhanced concept of the TeamCom SCE for automated generated services based on JSLEE
Enhanced concept of the TeamCom SCE for automated generated services based on JSLEE Thomas Eichelmann 1, 2, Woldemar Fuhrmann 3, Ulrich Trick 1, Bogdan Ghita 2 1 Research Group for Telecommunication Networks,
Building the Tower of Babel: Converting XML Documents to VoiceXML for Accessibility 1
Building the Tower of Babel: Converting XML Documents to VoiceXML for Accessibility 1 G. Gupta, O. El Khatib, M. F. Noamany, H. Guo Laboratory for Logic, Databases, and Advanced Programming Department
Chapter 1. Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web
Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Corey A Harper University of Oregon Libraries Tel: +1 541 346 1854 Fax:+1 541 346 3485 [email protected]
Programming Languages
Programming Languages Qing Yi Course web site: www.cs.utsa.edu/~qingyi/cs3723 cs3723 1 A little about myself Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science Office:
REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,
REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES Colette Wilklow MS 301-240, Pasadena, CA phone + 1 818 354-4674 fax + 1 818 393-4100 email: [email protected]
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
2. Distributed Handwriting Recognition. Abstract. 1. Introduction
XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk
Simplifying e Business Collaboration by providing a Semantic Mapping Platform
Simplifying e Business Collaboration by providing a Semantic Mapping Platform Abels, Sven 1 ; Sheikhhasan Hamzeh 1 ; Cranner, Paul 2 1 TIE Nederland BV, 1119 PS Amsterdam, Netherlands 2 University of Sunderland,
Tool Support for Model Checking of Web application designs *
Tool Support for Model Checking of Web application designs * Marco Brambilla 1, Jordi Cabot 2 and Nathalie Moreno 3 1 Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza L. Da Vinci,
Simplifying the Development of Rules Using Domain Specific Languages in DROOLS
Simplifying the Development of Rules Using Domain Specific Languages in DROOLS Ludwig Ostermayer, Geng Sun, Dietmar Seipel University of Würzburg, Dept. of Computer Science INAP 2013 Kiel, 12.09.2013 1
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Design Patterns in Parsing
Abstract Axel T. Schreiner Department of Computer Science Rochester Institute of Technology 102 Lomb Memorial Drive Rochester NY 14623-5608 USA [email protected] Design Patterns in Parsing James E. Heliotis
XML: extensible Markup Language. Anabel Fraga
XML: extensible Markup Language Anabel Fraga Table of Contents Historic Introduction XML vs. HTML XML Characteristics HTML Document XML Document XML General Rules Well Formed and Valid Documents Elements
Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages
ICOM 4036 Programming Languages Preliminaries Dr. Amirhossein Chinaei Dept. of Electrical & Computer Engineering UPRM Spring 2010 Language Evaluation Criteria Readability: the ease with which programs
1/20/2016 INTRODUCTION
INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We
An XML Based Data Exchange Model for Power System Studies
ARI The Bulletin of the Istanbul Technical University VOLUME 54, NUMBER 2 Communicated by Sondan Durukanoğlu Feyiz An XML Based Data Exchange Model for Power System Studies Hasan Dağ Department of Electrical
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
Last Week. XML (extensible Markup Language) HTML Deficiencies. XML Advantages. Syntax of XML DHTML. Applets. Modifying DOM Event bubbling
XML (extensible Markup Language) Nan Niu ([email protected]) CSC309 -- Fall 2008 DHTML Modifying DOM Event bubbling Applets Last Week 2 HTML Deficiencies Fixed set of tags No standard way to create new
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,
CommentTemplate: A Lightweight Code Generator for Java built with Eclipse Modeling Technology
CommentTemplate: A Lightweight Code Generator for Java built with Eclipse Modeling Technology Jendrik Johannes, Mirko Seifert, Christian Wende, Florian Heidenreich, and Uwe Aßmann DevBoost GmbH D-10179,
Representing XML Schema in UML A Comparison of Approaches
Representing XML Schema in UML A Comparison of Approaches Martin Bernauer, Gerti Kappel, Gerhard Kramler Business Informatics Group, Vienna University of Technology, Austria {lastname}@big.tuwien.ac.at
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
CHECKLIST FOR THE DEGREE PROJECT REPORT
Kerstin Frenckner, [email protected] Copyright CSC 25 mars 2009 CHECKLIST FOR THE DEGREE PROJECT REPORT This checklist has been written to help you check that your report matches the demands that are
ML for the Working Programmer
ML for the Working Programmer 2nd edition Lawrence C. Paulson University of Cambridge CAMBRIDGE UNIVERSITY PRESS CONTENTS Preface to the Second Edition Preface xiii xv 1 Standard ML 1 Functional Programming
FreeDict: an Open Source repository of TEI-encoded bilingual dictionaries
FreeDict: an Open Source repository of TEI-encoded bilingual dictionaries Piotr Bański Institute of English Studies, University of Warsaw Beata Wójtowicz Dept of African Languages and Cultures, University
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania [email protected]
Activity Mining for Discovering Software Process Models
Activity Mining for Discovering Software Process Models Ekkart Kindler, Vladimir Rubin, Wilhelm Schäfer Software Engineering Group, University of Paderborn, Germany [kindler, vroubine, wilhelm]@uni-paderborn.de
DSLs to fully generate Business Applications Daniel Stieger, Matthias Farwick, Berthold Agreiter, Wolfgang Messner
DSLs to fully generate Business Applications Daniel Stieger, Matthias Farwick, Berthold Agreiter, Wolfgang Messner Introduction One of our customers, a major Austrian retailer, approached us with specific
UML-based Test Generation and Execution
UML-based Test Generation and Execution Jean Hartmann, Marlon Vieira, Herb Foster, Axel Ruder Siemens Corporate Research, Inc. 755 College Road East Princeton NJ 08540, USA [email protected] ABSTRACT
Modeling BPMN Diagrams within XTT2 Framework. A Critical Analysis**
AUTOMATYKA 2011 Tom 15 Zeszyt 2 Antoni Ligêza*, Tomasz Maœlanka*, Krzysztof Kluza*, Grzegorz Jacek Nalepa* Modeling BPMN Diagrams within XTT2 Framework. A Critical Analysis** 1. Introduction Design, analysis
Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013
Markus Dickinson Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013 1 / 34 Basic text analysis Before any sophisticated analysis, we want ways to get a sense of text data
VoiceXML-Based Dialogue Systems
VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based
Jos Warmer, Independent [email protected] www.openmodeling.nl
Domain Specific Languages for Business Users Jos Warmer, Independent [email protected] www.openmodeling.nl Sheet 2 Background Experience Business DSLs Insurance Product Modeling (structure) Pattern
An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration
Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 3 rd, 2013 An Approach to Eliminate Semantic Heterogenity Using Ontologies in Enterprise Data Integeration Srinivasan Shanmugam and
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases
An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases Paul L. Bergstein, Priyanka Gariba, Vaibhavi Pisolkar, and Sheetal Subbanwad Dept. of Computer and Information Science,
Paraphrasing controlled English texts
Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich [email protected] Abstract. We discuss paraphrasing controlled English texts, by defining
estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS
WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September
Managing XML Documents Versions and Upgrades with XSLT
Managing XML Documents Versions and Upgrades with XSLT Vadim Zaliva, [email protected] 2001 Abstract This paper describes mechanism for versioning and upgrding XML configuration files used in FWBuilder
Development of Tool Extensions with MOFLON
Development of Tool Extensions with MOFLON Ingo Weisemöller, Felix Klar, and Andy Schürr Fachgebiet Echtzeitsysteme Technische Universität Darmstadt D-64283 Darmstadt, Germany {weisemoeller klar schuerr}@es.tu-darmstadt.de
Computer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
Software Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
Artificial Intelligence. Class: 3 rd
Artificial Intelligence Class: 3 rd Teaching scheme: 4 hours lecture credits: Course description: This subject covers the fundamentals of Artificial Intelligence including programming in logic, knowledge
Computer-aided Document Indexing System
Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous
Analysis and Design of Software Systems Practical Session 01. System Layering
Analysis and Design of Software Systems Practical Session 01 System Layering Outline Course Overview Course Objectives Computer Science vs. Software Engineering Layered Architectures Selected topics in
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology
Extraction of Legal Definitions from a Japanese Statutory Corpus Toward Construction of a Legal Term Ontology Makoto Nakamura, Yasuhiro Ogawa, Katsuhiko Toyama Japan Legal Information Institute, Graduate
UML Representation Proposal for XTT Rule Design Method
UML Representation Proposal for XTT Rule Design Method Grzegorz J. Nalepa 1 and Krzysztof Kluza 1 Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Kraków, Poland
Electronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
10CS73:Web Programming
10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server
Extending Data Processing Capabilities of Relational Database Management Systems.
Extending Data Processing Capabilities of Relational Database Management Systems. Igor Wojnicki University of Missouri St. Louis Department of Mathematics and Computer Science 8001 Natural Bridge Road
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques
Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Automatic Text Analysis Using Drupal
Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing
II. PREVIOUS RELATED WORK
An extended rule framework for web forms: adding to metadata with custom rules to control appearance Atia M. Albhbah and Mick J. Ridley Abstract This paper proposes the use of rules that involve code to
Cedalion A Language Oriented Programming Language (Extended Abstract)
Cedalion A Language Oriented Programming Language (Extended Abstract) David H. Lorenz Boaz Rosenan The Open University of Israel Abstract Implementations of language oriented programming (LOP) are typically
Syntax Check of Embedded SQL in C++ with Proto
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 2. pp. 383 390. Syntax Check of Embedded SQL in C++ with Proto Zalán Szűgyi, Zoltán Porkoláb
