UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO. A Framework for Integrating Natural Language Tools

Size: px
Start display at page:

Download "UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO. A Framework for Integrating Natural Language Tools"

Transcription

1 UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO A Framework for Integrating Natural Language Tools João de Almeida Varelas Graça (Licenciado) Dissertação para obtenção do Grau de Mestre em Engenharia Informática e de Computadores DOCUMENTO PROVISÓRIO Fevereiro 2006

2

3 Abstract Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics that studies the problems inherent to the processing and manipulation of natural language. NLP systems are typically characterized by a pipeline architecture, in which several NLP tools connected as a chain of filters apply successive transformations to the data that flows through the system. Usually, each tool is independently developed by a different researcher whose focus is on his/her own research problem rather than on the future integration of the tool in a broader system. Hence, when integrating such tools, one may face problems that lead to information losses, such as: (i) the output of a tool consists of the data it has acted upon and usually does not contain all the input data. Sometimes this raises a problem if the discarded information is required by a tool that appears at a later stage of the pipeline; (ii) each tool has its own input/output format so conversions between data formats may be needed when a tool consumes data produced by another one. This conversion may not be possible if the descriptive power of each format is distinct; (iii) the formats used by different tools do not establish relations between the input/output data. These relations are useful for aligning information produced at different levels and to avoid the repetition of common data across them. These problems make the reuse of NLP tools in distinct NLP systems a cumbersome task. This dissertation proposes a solution to these problems, by using a client-server architecture. The server acts as a blackboard where all tools add/consult the data. In our solution, a tool adds a layer of linguistic information over a data signal and the system maintains the cross-relations between the existing layers of data. The data is kept in the repository under a conceptual model that is independent of the client tools which allow the representation of a broad range of linguistic information. The tools

4 interact with the repository through a generic remote API which allows the creation of new data and the navigation through all the existing data. Moreover, this work provides libraries implemented in several programming languages that abstract the connection and communication protocol details between the NLP tools and the server. These libraries also offer additional levels of functionality that simplify the creation of NLP tools.

5 Resumo O Processamento de Língua Natural (PLN) é um ramo da Inteligência Artificial que estuda os problemas inerentes ao processamento e manipulação da Língua Natural. Os sistemas de PLN são normalmente caracterizados por uma arquitectura de canais e filtros onde um conjunto de ferramentas de PLN aplica um conjunto sucessivo de transformações aos dados que fluem no sistema. Cada ferramenta é normalmente desenvolvida por um investigador, cuja preocupação se centra no seu problema e não na integração da sua ferramenta em futuros sistemas. Quando se integram diferentes ferramentas para criar um sistema, surgem tipicamente os seguintes problemas, que podem levar à perda de informação: i) a saída de cada ferramenta consiste nos dados que ela alterou, e pode não conter todos os dados de entrada. Este facto pode originar problemas se a informação que foi descartada for necessária para ferramentas que apareçam posteriormente no sistema; ii) cada ferramenta possui o seu próprio formato de dados, logo é necessário converter os diferentes formatos para permitir que as ferramentas comuniquem entre si. Adicionalmente a expressividade de cada formato pode não ser diferente, caso em que a conversão pode não ser possível; iii) as diferentes ferramentas não estabelecem relações entre os dados de entrada e saída necessários para alinhar os dados produzidos por diversas ferramentas, e para evitar a replicação de informação. Estes problemas dificultam a reutilização de ferramentas de PLN em diferentes sistemas de PLN. Este trabalho apresenta uma solução para estes problemas que consiste na utilização de uma arquitectura cliente servidor. O servidor é um repositório usado pelas ferramentas para adicionar e consultar informação. Cada ferramenta adiciona um nível de informação sobre um sinal de dados e o sistema mantem relações entre os diversos níveis. Os dados são guardados no repositório sob um modelo concep-

6 tual, independente das diversas ferramentas, e que permite representar diversos tipos de informação linguística. As ferramentas interagem com o servidor através de uma interface remota que permite que estas adicionem dados e naveguem através de todos os dados existentes. Este trabalho oferece ainda bibliotecas implementadas em diversas linguagens de programação que abstraem os detalhes referentes ao protocolo de ligação e comunicação entre o cliente e o servidor. Estas bibliotecas oferecem funcionalidade acrescida às ferramentas, o que simplifica a sua criação.

7 Keywords & Palavras Chave Keywords Natural Language processing systems Natural Language processing tools integration Repository Linguistic Annotation Data lineage Information loss Palavras Chave Sistemas de processamento de Língua Natural Integração de ferramentas de processamento de Língua Natural Repositório Anotação Linguística Alinhamento de dados Perda de informação

8

9 Acknowledgments I would like to express my gratitude to everyone that helped me during the development of this dissertation, provided me with their support, and endured my constant stress and bad temper. Without them this work would not have been possible. I would like to thank my Supervisor, Professor Nuno Mamede, for all his guidance over these years, his constant advices and corrections, and his never ending patience towards my doubts and requests. I also cannot forget the support provided by my Co-Supervisor, João Dias Pereira. Both helped me immensely in the development of this dissertation. My thanks extend to the INESC-ID Spoken Language Systems lab team, who were extremely welcoming and cooperative, and particularly to those who worked more closely with me in this project: David Matos, Luísa Coheur, Ricardo Ribeiro, Joana Paulo, Fernando Baptista and Paula Vaz. I thank them for their suggestions and support. Thanks also to André Nascimento for his help with the proof-reading of the dissertation s text. To all my friend that have always been there for me, even when things seemed to be going wrong, thank you for your words of comfort and motivation. And last, but certainly not least, I thank my family, for their unconditional support, not only throughout this project, but also for my entire life. Lisboa, February 22, 2006 João de Almeida Varelas Graça

10

11 Contents 1 Introduction Motivation Objectives Proposed Solution Architecture Conversions between data formats Data lineage Summary Requirements Conceptual Model Requirements System Requirements Contributions Dissertation Structure Related Work Introduction AGTK ATLAS EMDROS i

12 2.5 NLTK GATE Festival Summary Conceptual Model Introduction Conceptual Model Entities Repository Data SignalData, Index and Region Analysis Segment and Segmentation Relation Classification CrossRelation Conceptual Model API Summary Architecture Introduction Server architecture Server Architecture Description Data Layer ii

13 Service Layer Remote Interface Server Architecture Implementation Data Layer Service Layer Remote Interface Server Architecture Interaction Examples Client Library Client Library Description Client Stub Layer Conceptual Model Layer Extra Layers Java Implementation Client Stub Conceptual Model Extra Layers Solution Validation Tools Simple NLP system Concurrent processing Summary Conclusion and Future Work Summary Future work iii

14 5.3 Contributions A Conceptual Model API 95 B Repository Server 105 B.1 Data Layer API B.2 Conceptual Model Persistent Format B.3 Data Transfer Objects B.4 Remote Interface API iv

15 List of Figures 1.1 Example of an NLP system using a pipes and filters architecture Each tool passing all input data to output Tool consuming data from different tools Component combining data from different tools An NLP system using the shared repository External components converting data formats Identification of words in a text Ambiguous segmentation example Classification Ambiguity example AGTK internal structure Annotation Graphs example ATLAS region utilization example ATLAS Children utilization example EMDROS database example Gate annotation graph example Utterance example Conceptual Model class diagram Repository class diagram v

16 3.3 SignalData class diagram TextSignalData class diagram Analysis class diagram Segmentation and Segment class diagram Relation class diagram Classification class diagram CrossRelation class diagram Iterator class diagram type and description class diagram Complete conceptual model class diagram Client Server Architecture Server layers DTO class diagram Client Library Internal Structure Extra layers interfaces Examples notation vi

17 List of Tables 2.1 System requirements resume Conceptual model requirements resume vii

18 viii

19 1.1 Motivation 1 Introduction Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics that studies the problems inherent to the processing and manipulation of natural language. It is devoted to making computers understand statements written in human languages. Several NLP systems have been developed to solve some of the major NLP tasks, such as question answering and dialog systems. Usually, NLP systems are composed of several NLP tools where each tool is charged of a specific linguistic task, such as word sense disambiguation or syntactic analysis. In most systems, tools are executed in a pipeline manner where each tool performs a processing step. It is common for these tools to use results from the previous processing steps and to produce new linguistic information that can be used by the following processing steps. For instance, a word sense disambiguation may combine the output of a word tokenizer and the output of a part of speech tagger. Tools are independently developed by distinct individuals whose focus is on his/her own problem rather than on the future integration of the tool in a broader system. Tool data formats are developed according to the tools requirements and normally the output of each tool does not contain all input information. So, some data is discarded between the Input/Output of each tool. Hence, when integrating such tools, several problems arise, which are mainly related to the following: (i) how the tools communicate with each other and (ii) what kind of information flows between them. At the Spoken Language Systems Lab (L 2 F), where this work was developed, sev-

20 2 CHAPTER 1. INTRODUCTION eral NLP systems have already been created. Whenever tools were integrated to compose a system, most of the detected problems were related to the information flow between those tools, which can lead to information losses. These problems are: Architectural problems - the information discarded along the system may be required further ahead by other tools; Conversion between data formats - conversions are necessary between the different data formats. If the expressiveness of each format is different, some of the formats may not be completely mappable into other formats. Besides these problems that lead to information losses there is another problem concerning the data: how to maintain the data lineage between information produced by different tools composing a NLP system. When viewing tools output as a layer of information over a primary data source and considering that layers are normally related to each other, it is desirable to maintain relations between those layers. First, it allows the data lineage between the different levels, allowing the navigation through related linguistic information produced by different tools. Secondly, tools can reference data from other layers avoiding the repetition of common data. These types of relations are called cross-relations because they span across linguistic information layers. Finally, a last problem concerns the fact that each NLP tool programmer usually develops its own data model to represent the linguistic information, and Input/Output facilities of that data model. Since, these different data models normally represent similar information they tend to have a big resemblance. The redefinition of such similar models represents a waste of time. Figure 1.1: Example of an NLP system using a pipes and filters architecture.

21 1.1. MOTIVATION 3 Figure 1.1 shows a simple NLP system composed by three components. The system initial input data is a text file. Component A performs the tokenization and part of speech tagging of the text, component B performs a post-morphologic analysis and component C performs a syntactic analysis. In this example two conversions between data formats are required, one between components A and B, and another between components B and C. This small system has the following problems: Component B receives the results from component A, and one of its tasks is to separate the contractions used in the text. From that point on the initial state of the text is lost. The syntactic parser (C) only uses the morphologic features of the words identified by component A. Its output is a set of syntactic trees which do not have any reference to the original words in the text. At this stage, the system has already lost the information about the words that belong to the text which might be required by components that are added in the future, to the system. Moreover, problems in the conversion between data formats may exist if the data formats expressiveness is different. For example, if a phonetic transcription of the original text was required, it would need the data of the syntactic analysis to select a possible interpretation of the words of the text, and the original words in the text. Since part of the information was lost throughout the system this information would not be present. So the new component would require all of the tools output as input and it would have to merge the information between the tools. However, if the information produced between those tools was aligned these problems could be overcome by following the relations between the different levels of data. The rest of this chapter is organized as follows. Section 1.2 describes the objective of this dissertation. Then, Section 1.3, proposes a solution to the detected problems, in the form of a shared repository for NLP tools. Section 1.4 defines a set of requirements that a solution for the detected problems must fulfil, and Section 1.5 describes the major contributions of this work. Finally Section 1.6 describes the structure of this thesis.

22 4 CHAPTER 1. INTRODUCTION 1.2 Objectives The main objective of this work is to build an NLP framework for the creation of NLP systems that: Avoids information losses between tools composing a system; Simplifies new NLP tools implementation by providing general Input/Output facilities and a data model to represent linguistic information; Maintains the data from tools aligned allowing the navigation through related data from different tools. 1.3 Proposed Solution This Section first describes some alternatives to solve the architectural and data format problems, and our solution to each one of them. Then it presents an objective that arises from the solution to those problems concerning the data lineage between the information produced by different tools. Finally it presents the overall solution composed by the solutions to each individual problem Architecture A possible approach to the architectural problem is guaranteeing that the tools produce all the information required for the next steps of the system. This simple approach only works when all of the system s tools are known in advance. Moreover, this approach demands that each tool has to produce extra information. This approach is not extensible because the addition of new tools may force changes on existing tools. A generalization of the previous alternative consists in guarantee that each tool passes all input information to its output (Figure 1.2). This strategy has the following problems: firstly, the tools must know how to manage a large amount of data, which

23 1.3. PROPOSED SOLUTION 5 may not be related with the tools themselves. Secondly, each tool may have to load and parse a large amount of data upon its initialization and consequently save a large amount of data when terminating. Finally, it complicates handling data from other tools simultaneously. For example, in Figure 1.3, component D, which is a morphologic disambiguation tool, requires the output of component B, and component C, both of which are part-of-speech taggers. In this example, component D has the responsibility of merging the common data from its input (text and dataa), besides performing its processing. This merging operation is difficult, and depends on the components used to produce the input, which leads to modifications in component D whenever component C, or B are changed. Figure 1.2: Each tool passing all input data to output. Figure 1.3: Tool consuming data from different tools. Another strategy that does not impose any extra effort on tools consists in having separated components, which know how to combine information from different tools (see Figure 1.4). Nevertheless, this strategy requires building new components every time a new combination of tools is available. Moreover, information intersection is not a trivial task and requires a considerable effort. This strategy was proposed on Galinha (de Matos et al., 2002, 2003).

24 6 CHAPTER 1. INTRODUCTION Figure 1.4: Component combining data from different tools. Our solution consists in shifting the architectural pattern from pipes and filters to a client-server style where the server is a shared data style. This solution proposes a shared repository where tools add new layers of information without changing the existing ones (see Figure 1.5). All linguistic information is available in the server. This way tools only have to select the required information, avoiding the loading, parsing and saving of extra data. Since the server is itself a shared data style we avoid information lost because all information is kept in the repository and it is never removed. Moreover, if this solution is used the information merging problem is avoided, since each tool only uses the layers it requires as input adding the new produced layer at the end. Figure 1.5: An NLP system using the shared repository Conversions between data formats A strategy to manage the different formats used by each tool consists in having specific components that are responsible for performing the data conversions (see Figure 1.6). Even if the creation of such components is simplified, for example, by imposing that

25 1.3. PROPOSED SOLUTION 7 each tool consumes/produces data in XML format, and a XSLT engine is used to perform the conversions, it is still necessary to define a XSLT style sheet for each conversion. Figure 1.6: External components converting data formats. This approach has two drawbacks. First, the need to define a XSLT for each data conversion. This might not be a trivial task, and if n distinct components exist, the number of possible XSLT grows at a rate of n*n. The second drawback is the expressiveness of each data format. This approach assumes that the content of each format is somehow describable in all the other formats, which may not be true, and in this case there will be information losses when the transformation is performed. This was the approach followed in (de Matos et al., 2002, 2003). Our solution consists in defining a conceptual model, capable of representing a broad range of linguistic information produced by NLP tools. The information stored in the shared repository is described using this conceptual model. Moreover, the tools may also use this model to represent their information, avoiding the need to perform any conversions. If the tools are not able to use this model, the model can be used as an interlingua between tools, so the number of necessary conversions is reduced from a rate of n*n to a rate of n. Since the model is able to represent a broad range of linguistic information, no information is lost due to lack of expressiveness of the format, because every possible data format should be mappable into this model Data lineage An approach to the data lineage problem is adding semantic information to each data layer in the shared repository. For example, supposing a morphologic layer and a

26 8 CHAPTER 1. INTRODUCTION syntactic layer, the layers semantics would force the use of morphologic layer elements by the syntactic layer. This way relations between layers would be established by the layers semantics. However, with this approach it is difficult to have several layers of the same semantic type. It also restricts the type of layers it is possible to define as well as the content of each layer. These restrictions are not desirable, since we require the solution to be generic enough to allow the creation of any NLP system, composed by any type of tools. Since the information produced by all NLP tools will be kept in the same repository, represented under the same conceptual model, our solution consists in extending the conceptual model, allowing the representation of cross-relations between the linguistic information belonging to different layers of information. Tools create cross-relations between their Input/Output data when creating the entities of the conceptual model Summary The solutions previously proposed for each of the problems are not independent from each other. The overall solution consists in using a client-server architecture, where the server is a repository of linguistic information and primary data sources represented under a conceptual model. The clients are NLP tools. All information produced by the clients is kept in the repository under a specific layer univocally identified. Each client can select information from the repository by selecting specific layers, and navigate through information from different layers using the cross-relations existing between the layers. Furthermore, besides the described server our proposed solution contains the definition of client libraries. A client library has the objective of simplifying the creation of NLP tools, by abstracting the connection and communication protocol details between the NLP tools and the server. Moreover, each client library provides several layers of functionality that simplify the use of the server and therefore the implementation of

27 1.4. REQUIREMENTS 9 new tools. For example the client library can provide an implementation of the conceptual model, where each element acts as a proxy for the corresponding server s element. Currently there is a version of the client library for the Java programming language. 1.4 Requirements This section defines a set of requirements for a solution that supports and simplifies the integration of independently developed NLP tools into NLP systems. These requirements were used to validate the proposed solution, and are the following: No information should be lost between tools in an NLP system; Tools should only produce directly related information; The solution should simplify the creation of new tools, by providing an Input/Output interface, which handles the loading and saving of data used by the tool and also by providing a data model that can be used by each tool to represent linguistic information; The solution should minimize the number of conversion components required to build an NLP system, when integrating existing NLP tools that do not comply with the system s model; The provided interface should allow the navigation between information produced by different NLP tools. To achieve the previous generic requirements we defined two groups of requirements that the solution must fulfil, namely: Conceptual Model Requirements - Represents the requirements for a conceptual model capable of representing and relating a broad range of linguistic information, which are described in Subsection 1.4.1;

28 10 CHAPTER 1. INTRODUCTION System requirements - Represents requirements of the underlying system related with the interaction between the system and the NLP tools, which are described in Subsection Conceptual Model Requirements The conceptual model s main requirement is that it must be able to represent a broad range of linguistic information produced by different NLP tools. Furthermore, the conceptual model must be extensible because it is impossible to foresee all kinds of linguistic information that may appear in the future. We begin by distinguishing two conceptually different kinds of information that the conceptual model must represent: Primary data sources such as a text, or a speech signal; Linguistic information produced by NLP tools, over primary data sources, or previously defined linguistic information. We also identify four types of actions that NLP tools may perform: Creation and edition of primary data sources, for example, the incremental creation of a new primary data source containing the phonetic transcription of the text belonging to another primary data source. This newly created data source can be the target of the linguistic information of other tools; Identification of linguistic elements from a primary data source, for instance, the segmentation of a sentence into words; Creation of relational information between linguistic elements, such as the relation between a verb and the corresponding subject. Assignment of characteristics to a linguistic element, or a relation, for example, the morphological features of a word;

29 1.4. REQUIREMENTS 11 Each NLP tool may produce several types of information at the same time. The linguistic information generated by an NLP tool is normally derived from linguistic information created by other tools. For example, a part of speech tagger will use the segments produced by a tokenizer and add morphologic information to those segments. A morphological desambiguator may use the classifications produced by several part of speech taggers to select the most appropriate classification. The conceptual model must be able to represent several layers of both primary data sources and linguistic information. We have defined the following requirements regarding these two types of information: The conceptual model must be able to represent any kind of primary data source such as text, speech, video, or any combination of these; The conceptual model must support the creation and edition of primary data sources; All linguistic information except primary data sources produced by an NLP tool must be associated with the same layer; The conceptual model must allow the selection of linguistic information through the identification of the layer that contains it; Each layer is associated with the identification of the tool that produced it. The last three requirements are necessary to simplify the identification of information inside the conceptual model. This way all linguistic information is organized into layers identified by the tool that produced it. The conceptual model must represent the three types of linguistic information that each NLP tool can produce: (i) the identification of linguistic elements; (ii) the creation of relations between linguistic elements; (iii) and the assignment of characteristic to linguistic element and the relations. Figure 1.7 shows an example of the identification of linguistic elements, namely the words of a text.

30 12 CHAPTER 1. INTRODUCTION Figure 1.7: Identification of words in a text. We have defined the following requirements for the representation of those linguistic elements: The model must be able to represent ambiguity in the identification of linguistic elements, for example, a compound term can be segmented as only one segment containing the compound term or several segments for each word. Figure 1.8 shows an example of segmentation ambiguity; The model must be able to represent trees of linguistic elements, for example, syntactic trees. The model must allow the creation of relations between linguistic elements from other layers. It must be possible to represent classification ambiguity, which correspond to associating disjunct sets of characteristic to the same linguistic element. For example, distinct morphological features for the same word. Figure 1.9 shows distinct grammatical categories for the word that; The model must allow the association of characteristic to linguistic elements, or relations from other layers. Figure 1.8: Ambiguous segmentation example.

31 1.4. REQUIREMENTS 13 Figure 1.9: Classification Ambiguity example. Besides the representation of linguistic information produced by each NLP tool, the conceptual model has the following requirements concerning the relations between linguistic information from different layers: The model must be able to represent relations between linguistic elements from different layers. These relations represent dependencies between layers of information, and allow the navigation between layers; The conceptual model must allow linguistic elements to reference data belonging to a primary data source, without having to copy its value, thus avoiding the repetition of the same data in several layers; The model must be able to represent data which may not exist in any primary data source, for example, the separation of contractions System Requirements This subsection presents the general requirements of the system which are not related to its conceptual model, but with the interaction between the system and the NLP tools, which are the following: The system must simplify the iteration of data in the repository, e.g. all segments from a segmentation. This is required because iteration is the most common way of interaction between an NLP tool and its data; The system must allow the selection of the data based on the layer s identification; this way an NLP tool only handles the data it requires;

32 14 CHAPTER 1. INTRODUCTION The system must allow the access to data from an unfinished Analysis to allow the parallel processing of data. This way an NLP tool may consume information that is being produced at the same time by another NLP tool; The system must persist its data; The system must interact with NLP tools written in any programming language. 1.5 Contributions The main contributions of this work are: The definition of a conceptual model that can be used as a common model by several NLP tools, and is able to represent a broad range of linguistic information, different types of primary data, and related the represented information; The implementation of a framework for NLP systems that reduces the effort to implement and integrate NLP tools. 1.6 Dissertation Structure Chapter 2 describes several works that solve similar problems to the ones addressed in this thesis. These works consist in architectures that try to simplify the creation of NLP systems based on independent NLP tools, and in linguistic annotation frameworks, which try to abstract the logical structure of linguistic annotations, thus creating a conceptual core capable of representing all kinds of linguistic information. Chapter 3 presents our conceptual model, which fulfils the requirements described in Subsection 1.4.1, by defining its entities and their responsibilities. Chapter 4 describes the proposed architecture for this work, its general principles and an implementation. Finally, in Chapter 5, makes some remarks of the developed solution and presents some pointers for future work.

33 2.1 Introduction 2 Related Work This chapter presents a description of some proposals that we compared during the development of this work. The shared data style architecture was analysed. It consists in a data store used by several tools. A shared data style may be a repository or a blackboard. The main difference is that while the repository is passive and the clients access the data as required, the blackboard is active and defines the client execution order. However, the term blackboard is sometimes used in a broader sense meaning a shared data style. The idea of the shared data style is based on a metaphor where a group of experts gathers around a blackboard and work cooperatively to solve a problem, using the blackboard as the workplace for developing the problem. The blackboard architecture is not a new technology, the first blackboard system: the hearsay-ii speech understanding system (Hayes-Roth et al., 1978) was developed in Some blackboard characteristics are described in (Corkill, 1991) from which we present those that also apply to a repository, which are the following: Independence of expertise - Each module is a specialist in solving a certain aspect of the problem. A module does not depend on other modules to produce its contributions. A module only has to find the information it needs inside the blackboard and then it proceeds with no assistance from other modules. New modules may be added to the blackboard without the need to change any existing modules. This corresponds to the notion of NLP tools developed by independent researchers, which are not able to preview how their tool is going to be used

34 16 CHAPTER 2. RELATED WORK in broader systems; Diversity in problem solving methods - In a blackboard system each module is a black box. The blackboard has no knowledge about the processing that each module performs, which corresponds to our notion of having a generic system that every NLP tool can use; Flexible representation of blackboard information - A blackboard system does not place any restriction on the information that a module can add. This property follows our desire of semantic independence on the data for the sake of generality; Common interaction language - All modules interacting with the repository must have a common understanding on the data the blackboard holds. So a common language must be available and should be used by every tool using the repository. If a tool could place information on the repository without following the common language, it would not be useful since no other tool could use that information properly. This property is achieved by the usage of a conceptual model, together with an interface that all tools must comply to; Position Metrics - A module should not have to scan the entire blackboard, which can be very big, to find the information it requires. One solution is to divide the blackboard into regions, each corresponding to a particular group of information, which in our case corresponds to the information produced by each NLP tool; Incremental solution generation - Blackboard systems operate incrementally. Each module will contribute to the solution with something they find appropriate. So it must be possible to represent partial solutions and unsolved ambiguities. A related field called linguistic annotation was also analysed. It deals with tools and formats for creating and managing linguistic annotations. Linguistic annotation

35 2.1. INTRODUCTION 17 covers any descriptive or analytic annotation over a raw source of data. For instance, the segmentation of a text into words, or the morphologic features of those words are both linguistic annotations. The motivation for this field is the enormous necessity for manually annotated corpora in the NLP field. These annotated corpora validate results from NLP tools, and help training NLP statistical tools. The creation of annotated corpora is a very demanding and expensive job. Moreover, as the diversity of annotations over a corpora increases, so increases the value of that linguistic database. Therefore, there is a great focus on the reutilization of linguistic annotation databases, which led to the definition of several standards for linguistic annotation formats. However, the diversity of existing annotations formats makes the reutilization of such databases more difficult. Due to these problems, the development of linguistic annotation frameworks became a priority. The main objective of these frameworks is to develop a logical level of linguistic annotations independent of the annotations physical format, which together with an interface and modules to convert from/to the existing data formats, should promote the reutilization of annotated corpora. The logical level, whose focus is on the logical structure of linguistic annotations rather than their content, should be able to represent all kinds of linguistic annotations existing in the various formats, which is precisely the purpose of our conceptual model. The Linguistic Annotation home page (Consortium, 2006) collects information not only about tools that have been widely used for constructing annotated linguistic databases, but also about the formats commonly adopted by such tools and databases. Another path or research regarding linguistic annotations is the definition of a set of requirements for annotation formalisms. We compared the requirements proposed in (Reidsma et al., 2004), and the requirements that are being defined by the International organization from standardization (ISO), which has formed a sub-committee (SC4) under technical committee 37 (TC37, Terminology and other language resources) to define a standard for linguistic annotation (Ide and Romary, 2001; Ide et al., 2003), against the ones we have defined. We compared two existing linguistic annotation frameworks against our require-

36 18 CHAPTER 2. RELATED WORK ments. In Section 2.2 we compared the Annotation Graphs Toolkit (AGTK) (Maeda et al., 2002). The AGTK is an implementation of the Annotation Graphs formalism (Bird and Liberman, 1999), which is the most cited work in this area. Section 2.3 compares the ATLAS architecture (Bird et al., 2000; Laprun et al., 99), which is a generalization of the Annotation Graphs formalism to allow the usage of multidimensional signals. We also compared several works regarding architectures that simplify the creation or integration of NLP tools towards their usage in NLP systems. In Section 2.4 we compared EMDROS (Petersen, 2004) which is an open source text database engine for analysis or retrieval of analyzed or annotated text. We continued in Section 2.5 by comparing the Natural Language Toolkit (NLTK) (Loper and Bird, 2002), which is a suite of Python libraries, and programs for symbolic, and statistical natural language processing. Section 2.6 we compared GATE (Bontcheva et al., 2004) which is a general Java based architecture for text engineering that promotes the integration of NLP tools by composing them into a pipes and filters architecture. Then, in Section 2.7, we compared Festival speech synthesis system (Taylor et al., 1998; Black and Taylor, 1997), which is a general framework for building speech synthesis systems. Finally, in Section 2.8, we present a brief summary, and some conclusions of this chapter. 2.2 AGTK The Annotation Graphs (Bird and Liberman, 1999) is a formal framework for representing linguistic annotations based on the analysis of several existing annotation formats. This analysis led to the development of a conceptual core called Annotation Graphs that according to Bird and Liberman (1999) can represent all kinds of linguistic annotations, thus serving as a interlingua between the different tools. The analysed annotation formats include: TIMIT (Garofolo et al., 1986) - a corpus of read speech designed for the recognition of acoustic-phonetic knowledge;

37 2.2. AGTK 19 Partitur (Schiel et al., 1998)- the format of the Bavarian Archive for Speech Signal made from the collective experience of a broad range of German speech databases; CHILDES (MacWhinney, 1995)- a database of transcript data collected from children and adults who were learning foreign languages; LACITO (Jacobson et al., 2001)- Collection of recorded and transcribed speech data of unwritten languages; NIST UTF (NIST, 1998)- universal transcription format developed by the US National Institute of Standards and Technology; Switchboard (Godfrey et al., 1993)- a corpus of conversational speech, containing several levels of distinct annotations; MUC-7 Message Understanding Conference (Hirschman and Chinchor, 1997)- defines a format for representing linguistic annotations used in information extraction, name entity recognition and coreference resolution. Since that analysis showed that the Annotation Graphs formalism supersede all the other annotation formats, and since we are going to compare the Annotation Graphs formalism against our requirements, no further evaluation was performed concerning the analysed works. The Annotation Graphs formalism focus on annotating speech signals, however the ideas could be extended to text. The Annotation Graphs Toolkit (AGTK) (Maeda et al., 2002) is a framework that provides an instantiation of the Annotation Graphs, and simplifies the creation of annotation tools by providing a set of interfaces to access the data represented in that formalism. The AGTK architecture consists of three modules (see Figure 2.1):

38 20 CHAPTER 2. RELATED WORK Figure 2.1: AGTK internal structure. AGLIB - The AGLIB provides an interface for Annotation Graphs formalism, implemented in C++. It is composed by the core module, the AGAPI and TreeAPI (Cotton and Bird, 2002), and the ODBC and IO interfaces; AG wrappers - The AG wrappers allows the connection to the AGLIB in several programming languages, for example the Java wrapper provides an interface in Java to the AGLIB; Input/Output Plugins - The Input/Output Plugins allow the definition of separated modules, which convert linguistic annotations to/from the Annotation Graphs formalism to other formats. The architecture of the AGTK was developed in order to simplify the creation of annotation tools. An annotation tool reads the existing linguistic annotations using the interface provided by the system, performs the annotations, and then, saves all the annotations. This approach is the same as the one illustrated in Figure 1.2 and presents the same problems. The Input/Output facilities of the AGTK imposed that all annotations are loaded and saved every time, so a lot of data has to be parsed by each NLP tool, even if the tool is not using this data. Moreover, if a NLP tool uses the output of two NLP tools which were executed in parallel, the NLP tool has to merge the annotations from both tools. Furthermore, a NLP tool cannot consume data that is being produced at the same time by other tool, so no parallel processing is possible.

39 2.2. AGTK 21 Figure 2.2: Annotation Graphs example. These problems exist even if the provided ODBC interface is used, because the ODBC interface only replaces the file Input/Output to a database Input/Output. The representation formalism of the AGTK is the Annotation Graphs formalism: an annotation graph is a directed acyclic graph where edges contain a set of features, and nodes may contain a time offset. The time offsets are relative to the signal that is being annotated. A formal description of the annotation graphs model can be found in Bird and Liberman (1999). Figure 2.2 shows an annotation graph for a morphologic analysis. Two nodes, marking the beginning and the end, represent each word. Each arc represents the grammatical classification of that word, or an indication of a white space. The AGTK defines an implementation of the Annotation Graphs formalism where several annotation graphs can be grouped, together with several Timelines in an AGSet. A Timeline represents a group of Signals that share the same reference. Each node of the graph is represented by an Anchor element which corresponds to a named offset into a Signal. The arcs of the graph are represented by Annotations elements which have a specific type and a set of attribute value pairs called Features. Some elements may have an attribute called Metadata, which is a set of attribute value pairs which allow the addition of non-linguistic information to those elements.

40 22 CHAPTER 2. RELATED WORK Several problems exist concerning the representation of the linguistic information produced by a NLP tool. The first problem detected in the Annotation Graphs formalism is that it has no concept to represent a linguistic element. It represents linguistic elements by defining two nodes pointing to their beginning and their end. This representation choice does not allow the representation of relational information between linguistic elements since that information corresponds to arcs between other arcs. This problem concerns both the concept of relational information that we defined in the requirements, and the relations between different layers. Another problem regarding the absence of representation for linguistic elements is that in order to keep a linked graph we must represent arcs which do not correspond to any linguistic element. For example, in Figure 2.2 we had to use arcs to represent the spaces between words. Another problem arises from the fact that the Annotation Graphs do not have the concept of classification ambiguity, meaning that a linguistic element may have several classifications. This ambiguity may be represented as several arcs between the same nodes, however using this alternative we cannot easily distinguish several alternative classifications from other annotations performed over the same zone of the Signal. The representation of segmentation ambiguity and hierarchical segmentation is possible using different arcs between the existing nodes, but again we lack the structural semantics of those elements which are conceptually different, thus making the utilization of the annotations more difficult, even if the Metadata is used to provide information about the semantic role of each arc. The Annotation Graphs formalism does not allow the edition of primary sources of data. The Anchor used to identify points in the Signal is a number that corresponds to an offset in a file. This representation is very restrictive and does not allow, for example, the case of multidimensional signals. Regarding the integration of linguistic information produced by several NLP tools, the Annotation Graphs presents some problems as well. The model has no concept of layer. If a layer is represented as an individual Annotation Graph it is not possible to access data from other layers, which is not acceptable for our purposes. Another al-

41 2.3. ATLAS 23 ternative is to keep all information in an Annotation Graph and use the Metadata to identify the information produced by each tool, but this alternative has several problems. Firstly, using this approach it is not possible to have several layers over different Signals. Secondly, it would be too difficult to select a sub-graph containing only the information produced by a NLP tool. 2.3 ATLAS ATLAS (Bird et al., 2000; Laprun et al., 99) provides a framework aimed at simplifying the development of linguistic annotation tools. The ATLAS architecture development regarded the extension of the Annotation Graphs formalism to handle multidimensional signals. There is an implementation of the ATLAS architecture in Java. However, this project was abandoned before any tool had used the proposed formalism. Nevertheless, the generalization performed over the Annotation Graphs has some interesting features. The ATLAS architecture follows the same principle, therefore has the same problems of the AGTK architecture so we will not describe them in here. The ATLAS annotation model extends the Annotation Graphs formalism in three aspects: i) it allows the representation of multidimensional signals; ii) it provides a better representation of hierarchical structures; iii) it adds semantic information to the elements of the model. These extensions originated a more generic model: all data represented in an Annotation Graph can be represented by the ATLAS model, but the contrary is not true. The representation of multidimensional regions is achieved by the introduction of the Region element, which is an abstraction representing a zone in a Signal which is delimited by two coordinates represented by Anchor elements. The Anchor element is the only tie between Annotations and the Signal. Figure 2.3 illustrates an annotation using the ATLAS model. There is a Signal which is a text, and two Regions selecting the

42 24 CHAPTER 2. RELATED WORK words The, and pretty. In this example the Anchors used by the Region element consist in numbers which indicate the characters position at the text. However, if other kind of Signal was used, only the type of the Anchor elements would change. Figure 2.3: ATLAS region utilization example. The addition of the Region concept allows the ATLAS model to represent all types of media. However, the model does not allow the editing of Signals. The representation of hierarchical structures was improved with the addition of the element Children which contains a list of other Annotations that are descendant of a parent Annotation. Figure 2.4 illustrates an example of the utilization of the Children elements where one annotation contains several children Annotations. The ATLAS architecture provides a semantic level, called Meta-Annotation infrastructure for ATLAS (MAIA) which adds semantic information to the elements of the model. MAIA defines a type system used by each ATLAS element. The type of the element restricts the possible relations between the elements and their features. The extension performed by the ATLAS architecture provided a better representation for conceptually different linguistic phenomena, such as hierarchical trees, and ambiguous segmentations, through the Children elements. It also fulfils the requirement of media independence. But even so this model still presents the main problems described for the Annotation Graphs formalism.

43 2.4. EMDROS 25 Figure 2.4: ATLAS Children utilization example. Nevertheless, the idea of representing zones in a data source using regions was used in our work, since it allows the rest of the model s interface to be independent of specific data source types. 2.4 EMDROS EMDROS (Petersen, 2004) is a text database engine for analysis or retrieval of analyzed or annotated text. The EMDROS system is composed by four layers: Client Layer - Represents the NLP tools that use the services provided EMDROS; MQL layer - Provides an interface to the MQL query language, which uses the EMdF layer to translate the MQL queries into SQL calls to the database layer; EMdF layer - Defines the annotation model used by the database;

44 26 CHAPTER 2. RELATED WORK Database layer - Represents a relational database which persistently stores the linguistic information. The principal concept of the EMdF is the Object element, which represents a linguistic element. It contains a set of Monads which correspond to the minimum granularity units that the database may have. Objects are grouped into Object Types which define the possible Features that each specific Object may have. A Feature is an attribute-value pair. Features elements are the entities that store all the existing data in the database. Figure 2.5: EMDROS database example. Figure 2.5 shows an EMDROS database. The database contains six monads, one for each character which in this case is the minimum granularity unit. There are two Object Types, letter, and name. Each letter Object has the Feature surface, which contains the letter that the Object is representing. The name Object contains no Features. There are six letter Objects, containing one Monad, and one letter Object containing a set with the six existing Monads. The EMdF model uses the Monads to establish relations between different Objects. In this example word Object contains all the letter Objects, because its set of Monads contains all the letter s Objects Monads. This model is too restrictive for our purposes, for example: the EMdF limits the data source type to text; each linguistic element is identified as an Object which has a set of Features, so it is not possible to represent several classifications for the same Object; Relational information between Objects from the same layer is also not possible.

Safe Resource Sharing in an Application Building Environment

Safe Resource Sharing in an Application Building Environment Safe Resource Sharing in an Application Building Environment David M. de Matos, Ângelo Reis, Marco Costa, Nuno J. Mamede L 2 F Spoken Language Systems Laboratory INESC ID Lisboa Rua Alves Redol 9, 1000-029

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal Paper Title: Generic Framework for Video Analysis Authors: Luís Filipe Tavares INESC Porto lft@inescporto.pt Luís Teixeira INESC Porto, Universidade Católica Portuguesa lmt@inescporto.pt Luís Corte-Real

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Natural Language Interfaces to Databases: simple tips towards usability

Natural Language Interfaces to Databases: simple tips towards usability Natural Language Interfaces to Databases: simple tips towards usability Luísa Coheur, Ana Guimarães, Nuno Mamede L 2 F/INESC-ID Lisboa Rua Alves Redol, 9, 1000-029 Lisboa, Portugal {lcoheur,arog,nuno.mamede}@l2f.inesc-id.pt

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases A tool for adding semantic value to improve information filtering (Post Workshop revised version, November 1997) Konstantinos

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

EuroRec Repository. Translation Manual. January 2012

EuroRec Repository. Translation Manual. January 2012 EuroRec Repository Translation Manual January 2012 Added to Deliverable D6.3 for the EHR-Q TN project EuroRec Repository Translations Manual January 2012 1/21 Table of Content 1 Property of the document...

More information

StreamServe Project Guide and Framework Versão 1.4 / Maio-2013

StreamServe Project Guide and Framework Versão 1.4 / Maio-2013 StreamServe Project Guide and Framework Versão 1.4 / Maio-2013 PAF011 Version control V 1.1 Revisão do documento 09-04-2012 Rui Miguel (DSI/DIS) V 1.2 Alteração da forma de obtenção do PIARQT012 29-08-2012

More information

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing LAW VI JEJU 2012 Bayu Distiawan Trisedya & Ruli Manurung Faculty of Computer Science Universitas

More information

MANAGEMENT SYSTEM FOR A FLEET OF VEHICLES BASED ON GPS. João André Correia Telo de Oliveira

MANAGEMENT SYSTEM FOR A FLEET OF VEHICLES BASED ON GPS. João André Correia Telo de Oliveira MANAGEMENT SYSTEM FOR A FLEET OF VEHICLES BASED ON GPS João André Correia Telo de Oliveira Author Affiliation(s) Instituto Superior Técnico, University of Lisbon, Portugal ABSTRACT This dissertation was

More information

Towards a flexible syntax/semantics interface

Towards a flexible syntax/semantics interface Towards a flexible syntax/semantics interface Luísa Coheur 1, Fernando Batista 2, and Nuno J. Mamede 3 Spoken Language Systems Lab Rua Alves Redol 9, 1000-029 Lisboa, Portugal 1 L 2 F INESC-ID Lisboa/IST/GRIL,

More information

BUILDING OLAP TOOLS OVER LARGE DATABASES

BUILDING OLAP TOOLS OVER LARGE DATABASES BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,

More information

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

2. Distributed Handwriting Recognition. Abstract. 1. Introduction XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk

More information

Prova escrita de conhecimentos específicos de Inglês

Prova escrita de conhecimentos específicos de Inglês Provas Especialmente Adequadas Destinadas a Avaliar a Capacidade para a Frequência dos Cursos Superiores do Instituto Politécnico de Leiria dos Maiores de 23 Anos - 2012 Instruções gerais Prova escrita

More information

The WebComfort Project

The WebComfort Project The WebComfort Project João Leonardo Carmo, Alberto Rodrigues da Silva SIQuant Engenharia do Território e Sistemas de Informação, Av. Casal Ribeiro, nº18 9ºDto, 1000-098 Lisboa, Portugal INESC-ID, Instituto

More information

Transcribing with Annotation Graphs

Transcribing with Annotation Graphs Transcribing with Annotation Graphs Edouard Geoffrois½, Claude Barras¾, Steven Bird, and Zhibiao Wu ½DGA/CTA/GIP ¾Spoken Language Processing Group LDC 16 bis av. Prieur de la Côte d Or, LIMSI-CNRS, BP

More information

Database System Concepts

Database System Concepts s Design Chapter 1: Introduction Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

GUIDELINES AND FORMAT SPECIFICATIONS FOR PROPOSALS, THESES, AND DISSERTATIONS

GUIDELINES AND FORMAT SPECIFICATIONS FOR PROPOSALS, THESES, AND DISSERTATIONS UNIVERSIDADE FEDERAL DE SANTA CATARINA CENTRO DE COMUNICAÇÃO E EXPRESSÃO PÓS-GRADUAÇÃO EM INGLÊS: ESTUDOS LINGUÍSTICOS E LITERÁRIOS GUIDELINES AND FORMAT SPECIFICATIONS FOR PROPOSALS, THESES, AND DISSERTATIONS

More information

Cloud Computing and Advanced Relationship Analytics

Cloud Computing and Advanced Relationship Analytics Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

Chapter 2 Database System Concepts and Architecture

Chapter 2 Database System Concepts and Architecture Chapter 2 Database System Concepts and Architecture Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Outline Data Models, Schemas, and Instances Three-Schema Architecture

More information

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.

More information

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,

More information

SOFTWARE TESTING TRAINING COURSES CONTENTS

SOFTWARE TESTING TRAINING COURSES CONTENTS SOFTWARE TESTING TRAINING COURSES CONTENTS 1 Unit I Description Objectves Duration Contents Software Testing Fundamentals and Best Practices This training course will give basic understanding on software

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

Masters in Information Technology

Masters in Information Technology Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL Frédéric Dufaux, Michael Ansorge, and Touradj Ebrahimi Institut de Traitement des Signaux Ecole Polytechnique Fédérale de Lausanne (EPFL)

More information

XML-Based Software Development

XML-Based Software Development 1 XML-Based Software Development Baltasar Fernández-Manjón, Alfredo Fernández-Valmayor, Antonio Navarro, José Luis Sierra Grupo de Investigación en Ingeniería del Software e Inteligencia Artificial. Departamento

More information

Introdução às Bases de Dados

Introdução às Bases de Dados Introdução às Bases de Dados 2011/12 http://ssdi.di.fct.unl.pt/ibd1112 Joaquim Silva (jfs@di.fct.unl.pt) The Bases de Dados subject Objective: To provide the basis for the modeling, implementation, analysis

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS

estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September

More information

Europass Curriculum Vitae

Europass Curriculum Vitae Europass Curriculum Vitae Personal information Surname(s) / First name(s) Ventura, Artur David Felix Address(es) Rua Visconde de Santarem, N o 4 5-esq, 1000 Lisboa Telephone(s) +351 91 967 39 16 Email(s)

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Using Object And Object-Oriented Technologies for XML-native Database Systems

Using Object And Object-Oriented Technologies for XML-native Database Systems Using Object And Object-Oriented Technologies for XML-native Database Systems David Toth and Michal Valenta David Toth and Michal Valenta Dept. of Computer Science and Engineering Dept. FEE, of Computer

More information

Architectural Patterns. Layers: Pattern. Architectural Pattern Examples. Layer 3. Component 3.1. Layer 2. Component 2.1 Component 2.2.

Architectural Patterns. Layers: Pattern. Architectural Pattern Examples. Layer 3. Component 3.1. Layer 2. Component 2.1 Component 2.2. Architectural Patterns Architectural Patterns Dr. James A. Bednar jbednar@inf.ed.ac.uk http://homepages.inf.ed.ac.uk/jbednar Dr. David Robertson dr@inf.ed.ac.uk http://www.inf.ed.ac.uk/ssp/members/dave.htm

More information

UPDATES OF LOGIC PROGRAMS

UPDATES OF LOGIC PROGRAMS Computing and Informatics, Vol. 20, 2001,????, V 2006-Nov-6 UPDATES OF LOGIC PROGRAMS Ján Šefránek Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University,

More information

Xml For Beginners - XMLText Editor

Xml For Beginners - XMLText Editor An integrated tool for annotating historical corpora Pablo Picasso Feliciano de Faria University of Campinas Campinas, Brazil pablofaria@gmail.com Fabio Natanael Kepler University of Sao Paulo Sao Paulo,

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Supporting Competence upon DotLRN through Personalization

Supporting Competence upon DotLRN through Personalization Supporting Competence upon DotLRN through Personalization Carolina Mejía, Laura Mancera, Sergio Gómez, Silvia Balidiris, Ramón Fabregat University of Girona, Institute of Informatics Applications, 17071

More information

Architecture Design & Sequence Diagram. Week 7

Architecture Design & Sequence Diagram. Week 7 Architecture Design & Sequence Diagram Week 7 Announcement Reminder Midterm I: 1:00 1:50 pm Wednesday 23 rd March Ch. 1, 2, 3 and 26.5 Hour 1, 6, 7 and 19 (pp.331 335) Multiple choice Agenda (Lecture)

More information

University Data Warehouse Design Issues: A Case Study

University Data Warehouse Design Issues: A Case Study Session 2358 University Data Warehouse Design Issues: A Case Study Melissa C. Lin Chief Information Office, University of Florida Abstract A discussion of the design and modeling issues associated with

More information

HYBRID INTELLIGENT SUITE FOR DECISION SUPPORT IN SUGARCANE HARVEST

HYBRID INTELLIGENT SUITE FOR DECISION SUPPORT IN SUGARCANE HARVEST HYBRID INTELLIGENT SUITE FOR DECISION SUPPORT IN SUGARCANE HARVEST FLÁVIO ROSENDO DA SILVA OLIVEIRA 1 DIOGO FERREIRA PACHECO 2 FERNANDO BUARQUE DE LIMA NETO 3 ABSTRACT: This paper presents a hybrid approach

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Managing large sound databases using Mpeg7

Managing large sound databases using Mpeg7 Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob (max.jacob@ircam.fr) ABSTRACT

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

Glance Project: a database retrieval mechanism for the ATLAS detector

Glance Project: a database retrieval mechanism for the ATLAS detector Glance Project: a database retrieval mechanism for the ATLAS detector C. Maidantchik COPPE, UFRJ, Brazil F. F. Grael and K. K. Galvão Escola Politécnica, UFRJ, Brazil K. Pommès CERN, Switzerland Abstract.

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

Distributed Database for Environmental Data Integration

Distributed Database for Environmental Data Integration Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information

More information

GEOSPATIAL METADATA RETRIEVAL FROM WEB SERVICES

GEOSPATIAL METADATA RETRIEVAL FROM WEB SERVICES GEOSPATIAL METADATA RETRIEVAL FROM WEB SERVICES Recuperação de metadados geoespaciais a partir de serviços web IVANILDO BARBOSA Instituto Militar de Engenharia Praça General Tibúrcio, 80 Praia Vermelha

More information

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013 HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013 Riley Moses Bri Fidder Jon Lewis Introduction & Product Vision BIMShift is a company that provides all

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Page 1 of 5. (Modules, Subjects) SENG DSYS PSYS KMS ADB INS IAT

Page 1 of 5. (Modules, Subjects) SENG DSYS PSYS KMS ADB INS IAT Page 1 of 5 A. Advanced Mathematics for CS A1. Line and surface integrals 2 2 A2. Scalar and vector potentials 2 2 A3. Orthogonal curvilinear coordinates 2 2 A4. Partial differential equations 2 2 4 A5.

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.4 ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT Marijus Bernotas, Remigijus Laurutis, Asta Slotkienė Information

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

SCADE System 17.0. Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System 17.0 1

SCADE System 17.0. Technical Data Sheet. System Requirements Analysis. Technical Data Sheet SCADE System 17.0 1 SCADE System 17.0 SCADE System is the product line of the ANSYS Embedded software family of products and solutions that empowers users with a systems design environment for use on systems with high dependability

More information

E-book Tutorial: MPEG-4 and OpenDocument

E-book Tutorial: MPEG-4 and OpenDocument Building an Impress Extension for Interactive MPEG-4 Video Conversion BRUNO CARPENTIERI and ROBERTO IANNONE Dipartimento di Informatica Università di Salerno Via S. Allende 84081 Fisciano (SA) ITALY bc@dia.unisa.it

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Data Warehouses in the Path from Databases to Archives

Data Warehouses in the Path from Databases to Archives Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction

More information

Abstract 1. INTRODUCTION

Abstract 1. INTRODUCTION A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es

More information

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery

Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University

More information

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................

More information

Presente e futuro del Web Semantico

Presente e futuro del Web Semantico Sistemi di Elaborazione dell informazione II Corso di Laurea Specialistica in Ingegneria Telematica II anno 4 CFU Università Kore Enna A.A. 2009-2010 Alessandro Longheu http://www.diit.unict.it/users/alongheu

More information

Evaluation of a Segmental Durations Model for TTS

Evaluation of a Segmental Durations Model for TTS Speech NLP Session Evaluation of a Segmental Durations Model for TTS João Paulo Teixeira, Diamantino Freitas* Instituto Politécnico de Bragança *Faculdade de Engenharia da Universidade do Porto Overview

More information

FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY AUTUMN 2016 BACHELOR COURSES

FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY AUTUMN 2016 BACHELOR COURSES FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY Please note! This is a preliminary list of courses for the study year 2016/2017. Changes may occur! AUTUMN 2016 BACHELOR COURSES DIP217 Applied Software

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

Development/Maintenance/Reuse: Software Evolution in Product Lines

Development/Maintenance/Reuse: Software Evolution in Product Lines Development/Maintenance/Reuse: Software Evolution in Product Lines Stephen R. Schach Vanderbilt University, Nashville, TN, USA Amir Tomer RAFAEL, Haifa, Israel Abstract The evolution tree model is a two-dimensional

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Ficha técnica de curso Código: IFCAD320a

Ficha técnica de curso Código: IFCAD320a Curso de: Objetivos: LDAP Iniciación y aprendizaje de todo el entorno y filosofía al Protocolo de Acceso a Directorios Ligeros. Conocer su estructura de árbol de almacenamiento. Destinado a: Todos los

More information

Language and Computation

Language and Computation Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters

More information

WEB-Based Automatic Layout Generation Tool with Visualization Features

WEB-Based Automatic Layout Generation Tool with Visualization Features WEB-Based Automatic Layout Generation Tool with Visualization Features João D. Togni* André I. Reis R. P. Ribas togni@inf.ufrgs.br andreis@inf.ufrgs.br rpribas@inf.ufrgs.br Instituto de Informática UFRGS

More information

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING Fall 2000 The instructions contained in this packet are to be used as a guide in preparing the Departmental Computer Science Degree Plan Form for the Bachelor's

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing A Framework for Developing the Web-based Integration Tool for Web-Oriented Warehousing PATRAVADEE VONGSUMEDH School of Science and Technology Bangkok University Rama IV road, Klong-Toey, BKK, 10110, THAILAND

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation*

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* Valerie Barr Department of Computer

More information

Development of a generic IT service catalog as pre-arrangement for Service Level Agreements

Development of a generic IT service catalog as pre-arrangement for Service Level Agreements Development of a generic IT service catalog as pre-arrangement for Service Level Agreements Thorsten Anders Universität Hamburg, Regionales Rechenzentrum, Schlüterstraße 70, 20146 Hamburg, Germany Thorsten.Anders@rrz.uni-hamburg.de

More information

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis

UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis UIMA: Unstructured Information Management Architecture for Data Mining Applications and developing an Annotator Component for Sentiment Analysis Jan Hajič, jr. Charles University in Prague Faculty of Mathematics

More information

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for

More information

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting

Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting Web-based Multimedia Content Management System for Effective News Personalization on Interactive Broadcasting S.N.CHEONG AZHAR K.M. M. HANMANDLU Faculty Of Engineering, Multimedia University, Jalan Multimedia,

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Metadata Management for Data Warehouse Projects

Metadata Management for Data Warehouse Projects Metadata Management for Data Warehouse Projects Stefano Cazzella Datamat S.p.A. stefano.cazzella@datamat.it Abstract Metadata management has been identified as one of the major critical success factor

More information

SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) SERVICE-ORIENTED BUSINESS INTEGRATION MODEL LANGUAGE SPECIFICATIONS

SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) SERVICE-ORIENTED BUSINESS INTEGRATION MODEL LANGUAGE SPECIFICATIONS SERVICE-ORIENTED MODELING FRAMEWORK (SOMF ) VERSION 2.1 SERVICE-ORIENTED BUSINESS INTEGRATION MODEL LANGUAGE SPECIFICATIONS 1 TABLE OF CONTENTS INTRODUCTION... 3 About The Service-Oriented Modeling Framework

More information

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830

More information

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT) The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit

More information