Computer Standards & Interfaces

Size: px
Start display at page:

Download "Computer Standards & Interfaces"

Transcription

1 Computer Standards & Interfaces 35 (2013) Contents lists available at SciVerse ScienceDirect Computer Standards & Interfaces journal homepage: How to make a natural language interface to query databases accessible to everyone: An example Miguel Llopis, Antonio Ferrández Dept. Languages and Information Systems, University of Alicante, Spain article info abstract Available online 12 October 2012 Keywords: Natural language interface Relational database Ontology extraction Concept hierarchy Query-authoring services Natural Language Interfaces to Query Databases (NLIDBs) have been an active research field since the 1960s. However, they have not been widely adopted. This article explores some of the biggest challenges and approaches for building NLIDBs and proposes techniques to reduce implementation and adoption costs. The article describes {AskMe*}, a new system that leverages some of these approaches and adds an innovative feature: query-authoring services, which lower the entry barrier for end users. Advantages of these approaches are proven with experimentation. Results confirm that, even when {AskMe*} is automatically reconfigurable against multiple domains, its accuracy is comparable to domain-specific NLIDBs Elsevier B.V. All rights reserved. 1. Introduction 2. Classification of existing NLIDBs A natural language interface to query databases (NLIDB) is a system that allows users to access information stored in a database by means of typing requests expressed in some natural language [1,2,22], such as English, Spanish, etc. NLIDBs have been a field of investigation since 1960s [2].There have been many interesting theories and approaches about how an NLIDB could be built, in order to improve their accuracy [3], how to make them more open in terms of the natural language expressions that they accept [4,16], or even more, how to make them guess the real intend of the user who is trying to construct a query where some pieces are missing [21], etc. We will analyze these approaches in this article. While the research work on NLIDBs has led to many different systems being implemented in academic and research environments (e.g. [2 5,8,9,17 19]), it is difficult to find many of these systems being used in business environments or being commercialized in companies expanding across various market segments or domain niches [22]. In this article, we will explore previous NLIDB systems and classify them based on the different approaches that they implement. At the same time, we will explain which of these approaches lead to reduced costs at different stages of the NLIDB lifecycle. Finally, we will look at how we have implemented our proposals to minimize implementation, configuration, portability and learning costs, by analyzing the implementation of {AskMe*}, an ongoing NLIDB research work. As we outlined in the previous section, there have been many different approaches to the construction of NLIDBs. There are various ways for classifying them. In this article, we will explore two of the most common taxonomies for classification of NLIDBs which appear across various overview articles about the NLIDB field (e.g. [2,22]) and is also complemented by our own research observations: - Based on user interface: textual NLIDBs vs. graphical NLIDBs. - Based on domain-dependency: domain-dependent vs. domainindependent NLIDBs. ο As part of the previous classification, we will divide these NLIDBs in subcategories, based on their degree of portability and reconfiguration capabilities. This particular classification is not something that we have found on previous work per se, but rather a pattern that we have extracted based on the characteristics of systems that we have analyzed and previous research papers on the field of NLIDBs that we have taken into account for our work. In the next sections, we will explore the idiosyncrasies of each of these approaches. It is important to emphasize that we do not claim one of these approaches to be better than others, as each of the approaches has its advantages and disadvantages [2,22]. However, we will evaluate the convenience of each of these approaches in regards to the main goal of our research work: optimize costs of NLIDBs NLIDBs by their user interface: textual interfaces vs. graphical interfaces Corresponding author. addresses: mll9@alu.ua.es (M. Llopis), antonio@dlsi.ua.es (A. Ferrández). One of the biggest questions in the space of NLIDBs through decades has been the disjunctive of choosing a textual or a graphical user /$ see front matter 2012 Elsevier B.V. All rights reserved.

2 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) interface to build the system. Each of these two alternatives has its own advantages and disadvantages that are worth considering, as described in [2,22]: - Textual NLIDBs: examples of this type of NLIDB are HEY [18], AT&T [19], LUNAR [2,24] or PRECISE [3]. ο Advantage: User is not required to learn any additional language. ο Disadvantages: Linguistic coverage of the system is not obvious. Overlap of linguistic and conceptual failures. - Graphical NLIDBs: an example of this type of NLIDBs is NL-Menu [30]. ο Advantage: Easy to dynamically constrain query formulation based on user selections, in order to only build valid queries. ο Disadvantages: Lack of flexibility in query formulation. Expressivity power reduced to the user interface design (less expressivity power than a textual natural language). While most of the NLIDBs built in the past can be classified in one of these two categories, there is an intermediate option between both which consists on combining the expressivity power of a textual NLIDB with the visual feedback to the user provided by a graphical NLIDB as we presented in our previous work [23]. This can be achieved by including query authoring services such as syntax coloring, text completions or keyword highlighting as part of the system design; our proposal is the first NLIDB that incorporates these features to the design of the system, to our best knowledge. Moreover, {AskMe*} helps the user to make valid queries by automatically distinguishing between linguistic and conceptual failures NLIDBs by their degree of portability and re-configurability: domain-dependent vs. domain-independent NLIDBs A second taxonomy in NLIDB classification can be made by considering the different approaches for how an NLIDB relates to the knowledge domain of the database that is being queried. - Domain-dependent NLIDBs: These NLIDBs need to know particularities about the underlying domain entities and restrictions in order to work. ο Non-Reconfigurable: Many of the NLIDBs in this group are designed ad-hoc for a particular problem domain (database). An example on this category is LUNAR [2,24]. ο Reconfigurable: Another group of NLIDBs are domain-dependent but can be reconfigured towards being used to query a database that belongs to a different domain. In most cases, this reconfiguration consists on remapping domain entities and terms from the DB in the query DSL. 1 This, very often, requires the intervention of a technical user in order to perform these adjustments. Examples in this category include AT&T [19] and ASK [2,22]. ο Auto-reconfigurable: This bucket is the most interesting from a cost-saving perspective [23,30], as it will allow NLIDBs that are knowledgeable about the underlying domain data (and therefore, they can provide more accurate information, error messages, etc.) and at the same time it enables non-technical users to connect to multiple databases without the need for manual reconfiguration. The system knows who (the database connection string) and what (entities, properties, data types, etc. generally captured in an underlying source of knowledge, such as ontologies) to ask in order to learn how to deal with the user queries. Examples of this category include HEY [18], GINLIDB [29], FREyA [28] and an NLIDB for CINDI virtual library [4]. 1 Domain Specific Language - Domain-independent NLIDBs: There are many other NLIDBs that allow the user to write queries in a natural language and that do not store any knowledge about the underlying domain; they simply translate NL queries into SQL queries and execute them against the underlying database [22]. Since the system does not know anything about the domain, it is not able to warn the user about conceptual errors in the query (entity property mismatch, data type mismatch, etc.) and therefore the error-catching will happen in the database, thus, making the system slower and less user-friendly when the query is ill-formed. An example of NLIDB system in this category is PRECISE [3]. The problem of portability of NLIDBs is, from our perspective, one of the most critical ones to be solved. By itself, the cost of developing an NLIDB can be very high, and in most of the approaches taken for creating NLIDBs, the resulting systems are tightly coupled to the underlying databases [22]. In the last few years, there have been interesting approaches to the design of NLIDBs that are database-independent (e.g. [3,4]), in the sense that they can cope effectively with queries targeting different domains without requiring substantial reconfiguration efforts. One of the best examples of this approach is PRECISE [3]. This system combines the latest advances in statistical parsers with a new concept of semantic tractability. This approach allows PRECISE to easily become highly reconfigurable. In addition, this was one of the first NLIDB systems that used the parser as a plug-in, so it could be changed with relative ease in order to leverage newest advantages in the parsers' space. An interesting advantage of adapting the parsing process to each of the knowledge domains that the system connects to is that analyzing an input question in NLIDB systems is often based on a part-of-speech (POS) tagging, followed by a syntactic analysis (partial or full) and finally, a more or less precise semantic interpretation. Although there are broadly accepted techniques for POS tagging (e.g. [5 7]) and syntactic analysis (e.g. [6]), techniques for semantic parsing are still very diverse and ad hoc. In an open-domain situation, where the user can ask questions on any topic, this task is often very difficult and relies mainly on lexical semantics only. However, when the domain is limited (as is the case of an NLIDB), the interpretation of a question becomes easier as the space of possible meanings is smaller, and specific templates can be used [8]. It has been demonstrated [9] that meta-knowledge of the database, namely the schema of the database, can be used as an additional resource to better interpret the question in a limited domain. Another interesting existing solution, based on the creation of a new NLIDB every time that the system is connected to a new database, is the system developed for the CINDI virtual library [4], which is based in the use of semantic templates. The input sentences are syntactically parsed using the Link Grammar Parser [10], and semantically parsed through the use of domain-specific templates. The system is composed of a pre-processor and a run-time module. The pre-processor builds a conceptual knowledge base from the database schema using WordNet [13]. This knowledge base is then used at run-time to semantically parse the input and create the corresponding SQL query. The system is meant to be domain independent and has been tested with the CINDI database that contains information on a virtual library. The improvements that our research work in {AskMe*} provides in regards to the portability problem space are described in the next sections. 3. Most significant costs in NLIDBs Building an NLIDB system and bringing it into production has a significant cost [2,22,27,28,30]. This cost can be analyzed and divided across the different stages of the NLIDB lifecycle: system implementation, deployment and configuration, and finally system users' adoption. - System implementation: Creating an NLIDB is not a trivial task; it represents an engineering effort that must be taken into account when

3 472 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) considering the creation of an NLIDB [2]. During the implementation phase, and independent from the planning methodology being used, the costs can be mostly divided in the following three categories: ο Design: The design of the system has an expensive cost; it is in this phase when various decisions must be taken: whether the system is designed to be domain-dependent or independent, what the different modules should look like lexer, syntactic and semantic parsers, translation to SQL, etc. This design phase might require weeks, even months, of engineering and architectural work [22]. ο Development: Even when there are tools and frameworks to assist in the process, creating a natural language interface is a laborious task. Being able to provide high expressivity power while also processing queries efficiently is hard [2,28]. ο Testing: In order to create a system than can be reliable, efficient and error-free, it is important to significantly invest in testing it: unit testing independent modules of the system, verifying the robustness and validity of the system when integrating various pieces or validating the usability or the system in end-to-end scenarios or queries are just some of the testing activities that must be done in this phase [2]. - Deployment and configuration: This phase comprises the different activities required to deploy and adapt the system, once it has been fully implemented and tested, for the real use in a concrete enterprise. It includes, among others, the following tasks: deploying system components, configuring connections between components, connecting the system to the domain database, mapping database entities to system keywords, training the system to understand users' expressions, and ensuring robustness and high-availability of the deployed system [22]. - Users' learning process: Last but certainly not the least, once the system has been deployed to an enterprise environment, it has to be accepted and understood by end-users. This is not a trivial process, in fact, making a system easy to understand, learn and use for the target user must be considered as the most important principle from the design stage and across all the other phases: the most complete and sophisticated NLIDB is worthless if users are not happy and satisfied while using and interacting with it, or even more, if they reject using it because they do not like it. Thus, the learning process must be made smooth and compelling for the users, and this implies that a few different factors must be taken into account: users' learning curve for NLI constructions, database entities and relationships, etc. may be slow without the help from the system; users' learning curve for system graphical user interface may require additional learning effort, users need to be trained in order to be able to troubleshoot the most frequent system errors by themselves (connectivity issues, users access, etc.) [27]. It is due to all the costs enumerated previously that we believe that an NLIDB, in order to be successfully and widely adopted in the real-world enterprise has to be designed once, being portable and able to target different databases and knowledge domains, can be reconfigured easily in order to connect to a different database without the need of specialized deployment or reconfiguration steps that end-users cannot understand and, finally, must allow users to be productive using the system since the first day of use, while implementing a mechanism for letting users learn more advanced concepts of the system as they use it. 4. Contributions of our approach compared to previous related work The main improvements of our proposal compared to other existing systems are the significant reduction of costs: implementation and reconfiguration costs are optimized due to the dynamic nature of the system; and learning costs for end users are greatly reduced as well thanks to the use of query-authoring services. Some other NLIDB systems developed in the past few years include GINLIDB [29], WASP[22,25] and NALIX [22,26]. GINLIDB represents an interesting attempt at creating a fully auto-reconfigurable or generic interactive (as the G and I letters in the acronym stand for) approach to the creation of NLIDBs. This system has been an inspiration for the work developed in {AskMe*}, however our system tries to go a step beyond what GINLIDB accomplished in auto-reconfiguration of the NLIDB; while GINLIDB lets the user define custom mappings between words in their input queries and actual database entities by means of graphical menus that are displayed after a query with errors or ambiguities has been introduced by the user, {AskMe*} attempts to provide richer query-authoring services, which are aimed at helping users to easily learn how to ask questions in a new domain, by providing query suggestions, error highlighting and domain-specific error descriptions, as we will describe later. WASP (Word Alignment-based Semantic Parsing) is a system developed at the University of Texas by Yuk Wah Wong [25]. While the system is designed to address the broader goal of constructing a complete, formal, symbolic, meaningful representation of a natural language sentence, it can also be applied to the NLIDB domain. A predicate logic (Prolog) was used as the formal query language. WASP learns to build a semantic parser given a corpus a set of natural language sentences annotated with their correct formal query languages. The strength of WASP comes from the ability to build a semantic parser from annotated corpora. This approach is beneficial because it uses statistical machine translation with minimal supervision. Therefore, the system does not have to manually develop a grammar in different domains. In spite of the strength, WASP also has two weaknesses. The first is: the system is based solely on the analysis of a sentence and its possible query translation, and the database part is therefore left untouched. There is a lot of information that can be extracted from a database, such as the lexical notation, the structure, and the relations within. Not using this knowledge prevents WASP to achieve better performances, and thisisanapproachthat{askme*}tries to improve as we will see later. The second problem is that the system requires a large amount of annotated corpora before it can be used, and building such corpora requires a large amount of work [22]. NALIX is a Natural Language Interface for an XML Database [26]. The database used for this system is extensible markup language (XML) database with Schema-Free XQuery as the database query language. Schema-Free XQuery is a query language designed mainly for retrieving information in XML. The idea is to use keyword search for databases. However, pure keyword search certainly cannot be applied. Therefore, some richer query mechanisms are added [26]. Given a collection of keywords, each keyword has several candidate XML elements to relate. All of these candidates are added to MQF (Meaningful Query Focus), which will automatically find all the relations between these elements. The main advantage of Schema-Free XQuery is that it is not necessary to map a query into the exact database schema, since it will automatically find all the relations given certain keywords. In NALIX the transformation processes are done in three steps: generating a parse tree, validating the parse tree, and translating the parse tree to an XQuery expression. This approach is being leveraged by our system as well, in the sense that user queries are validated by {AskMe*} before being executed against the database, thanks to the information available from the database schema, but with {AskMe*} we try to go beyond this in order to provide richer query-authoring services to the user in order to make the writing a query step more interactive and educational for the user, as we will describe later. One of the first natural language interfaces that provide a notion of suggestions to the user in order to author a query is OWLPath [27]. This system suggests to the user how to complete a query by combining the knowledge of two ontologies, namely, the question and the domain ontologies. The question ontology plays the role of a grammar, providing the basic syntactic structure for building sentences. The domain ontology characterizes the structure of the application-domain knowledge in terms of concepts and relationships. The system makes then suggestions

4 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) based on the content of the question ontology and its relationships with the domain ontology. Once the user has finished formulating the natural language query, OWLPath transforms it into a SPARQL query and issues it to the ontology repository. In the end, the results of the query are shown back to the user. This is an interesting approach in natural language interfaces to query ontologies that were published just a few months before the first publication about {AskMe*} [23]. While both systems leverage ontologies in order to provide the user suggestions in how to complete their queries, the systems are considerably different in a few aspects: OWLPath is a natural language interface to query ontologies, while {AskMe*} is a natural language interface to query databases that leverages ontology generation as a technique to capture the characteristics and semantics of the underlying database schema. In addition, while OWLPath provides query suggestions or auto-completions for terms that exist in the underlying ontology, it does not provide error information that is specific to the domain, in case that a query contains errors, which is something that {AskMe*} tries to emphasize in order to educate users and help them learn how to use the system and understand the logical model of the underlying domain. Another interesting and recent approach that inspires our work is FREyA [28], which combines syntactic parsing with the knowledge encoded in ontologies in order to reduce the customization effort. If the system fails to automatically derive an answer, it will generate clarification dialogs for the user. The user's selections are saved and used for training the system in order to improve its performance over time. While this is an interesting approach and inspires {AskMe*} in its principles, it differs from our research work in the sense that {AskMe*} focuses on helping users create valid queries from the beginning, as opposed to FREyA's approach of letting them introduce wrong queries and help the system correct them by means of clarification dialogs which are used for auto-correcting errors in the future. {AskMe*} is the first NLIDB system that proposes the combination of textual NLIDBs with rich query-authoring services (syntax coloring, error squiggles, tooltips, etc.). This provides a substantial improvement in the user experience when writing queries, especially in regards to query accuracy in order to solve both linguistic failures and conceptual failures, which could not be fully solved by the use of menu-based user interfaces either. The use of query-authoring services helps to reinforce the conceptual center of the dialog between the user and the NLIDB around the domain entities in focus. In order to achieve this, we combine domain-specific information, captured in concept-hierarchy ontologies any time the system is connected to a new database. The system automatically generates the syntactic and semantic parsing templates and the rest of components needed in order to provide query-authoring services. In addition, the system is fully auto-reconfigurable without the need of any specialized knowledge. This is a significant improvement compared to the existing portable solutions mentioned before, because it makes the entire reconfiguration process fully transparent to the end users as opposed to having to perform some reconfiguration steps for entity mapping, disambiguation, etc. This is a substantial improvement not only by the amount of extra work that is saved in the reconfiguration steps, but also because it enables the system to be automatically managed without user intervention. This represents a step towards the democratization of NLIDBs, as users fitting a non-technical profile will be able to use the system on their own, throughout the entire system lifecycle, from the very early steps of adoption and deployment of the system towards a real-world production environment to the management, reconfiguration and diagnosis steps across multiple domains, for which the system is able to adapt itself automatically. In this sense, also, the role of queryauthoring services is fundamental because they enable to perform the very little manual reconfigurations needed, if any, driven by intuitive real-time hints in the query authoring process. 5. {AskMe*}: an NLIDB that reduces adoption, portability and users' learning costs {AskMe*} is a database-independent NLIDB and uses a templatebased approach for the dynamic generation of the lexer, syntactic and semantic parsers. Fig. 1 shows the different modules of the system. An exhaustive description of every component of the system is out of the scope for this paper, instead we will focus on describing the most relevant techniques that enable the proposed improvements of the system compared to other state-of-the-art systems: dynamic generation of the system and query-authoring services. In order to make this analysis easier to follow, we will use a case of study and complement the description of each of these components with the application on a given domain. We will use a sub-set of Northwind [14] (see Fig. 2), a canonical example of a relational database Fig. 1. {AskMe*}'s high level architecture.

5 474 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Fig. 2. Sub-set of Northwind database schema. that captures the domain of a fictitious trading company, containing information about products, orders, suppliers, employees, etc Ontology builder The first operation performed once {AskMe*} connected to a database is to search for the ontology representing that domain in the ontology repository. This repository consists on a dictionary that stores ontology references for any given tuples bserver, Database> that the system has been connected to. If the ontology for that particular domain does not exist, the ontology generation process is triggered. This process [10] analyzes the database catalog and schema in order to build the ontology that captures the domain entities, properties, relationships and constraints. {AskMe*} is using OWL for representing ontologies. In order to build the ontology for each database, and keep the system within a manageable range of data volume, only the minimal information needed from the database is stored into the ontology. Concretely, entity names, properties and valuetypesaremappedfromthedatabaseintotheontology,whilethe actual data is not. The reason that motivates this decision is that we are using OWL as a way to represent domain characteristics (entity names, entity properties, relationships, etc.) in the underlying database, however, the actual data is much bigger in size than the schema and also changes more often than the schema. Therefore, we decided to perform an analysis of the database schema that allows us to capture the nature of the domain, while the actual data retrieval process part of each user query execution is performed directly against the database, after validating that all domain restrictions are being satisfied by the user query, for which we leverage the domain representation captured in the OWL ontology. As a matter of fact, if a user query that complies to all domain restrictions stored in OWL is then executed against the database and the result indicates that there has been a change in the underlying domain that makes the OWL be out of date, a new XML OWL generation process is triggered automatically, in order to refresh the domain ontology and keep it accurate at any time with the underlying database. In order to build the ontology capturing the mapping described above, {AskMe*} leverages OWLminer's approach [7] which consists on implementing the algorithm known as Feature and Relation Selection (FARS) [11]. FARS is multi relation feature selection that uses target tables and attributes in order to create join chains with other tables using foreign keys as links. The algorithm also uses Levenshtein Distance [12] as a metric for determining whether features are related or not. This metric is based on closeness between text and feature's value of the dataset. During this approximate search, every set of input texts from the set of relations and tables in the given database is analyzed. The result of this analysis is a set of attributes that meet the constraint: all members must be columns (properties) within the current database table (entity) as described in Table 1. After this first-level search has been performed for a given table, the next steps consist on finding the cross-table relationships, taxonomic and non-taxonomic relations and dependencies, in order to make the ontology grow in this dimension too. The attributes identified in the previous step are now used to analyze and discover the set of corresponding tables. As part of this process, also the primary and foreign keys are identified. The output of the feature and relation selection algorithm described in Fig. 3 is represented as an XML document where the first-level nodes in the tree represent tables. An example of the output of this algorithm can be found on Fig. 4, based on the Northwind schema described previously. Table 1 Database ontology mapping. Database component OWL component Table/entity Class Column Functional property Column metadata: OWL property restriction: - Data type. - All values from restriction. - Mandatory/non-nullable. - Cardinality() restriction. - Nullable. - MaxCardinality() restriction.

6 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Fig. 3. Feature and relation selection algorithm. As can be seen in Fig. 4, all entities in the previously described sample based on Northwind are being captured in a custom XML tree structure. This XML tree contains not only the entity (table) names but also columns and column types for each table, as well as primary key and foreign key information. The next step in the extraction process consists on converting this XML tree into an ontology that can be used to generate all queryauthoring services information needed. In order to achieve this, the previous XML tree needs to be converted into OWL and for that the tree is processed and each table node is converted into an OWL class in the resulting document. For this sample we will only focus on the foreign key relations for Category, Product, Supplier, Order and OrderDetail, but all other object properties would be represented in this OWL as well. In a similar way, each foreign key is expressed as an OWL object property in which two primary classes are related using domain and range attributes (see Fig. 5). By using this approach, the building process of the OWL ontology is accelerated and also the use of background knowledge helps to extract the required knowledge from the database. This approach is considerably better in cost (time and space) than simply mirroring the database schema to the ontology, based on multiple experiments as described in [11] Dynamic parser generation After building the ontology that captures the overall characteristics of the database domain, the next step consists on automatically building the parsers that will help understand users' queries and translate them into SQL queries to be executed against the database. As described previously, {AskMe*} is fully auto-reconfigurable and it can be pointed against multiple domains, while at the same time it is able to offer domain-specific features such as lexical, semantic and conceptual error detection. The key for these capabilities resides in the ability to perform this dynamic parser generation at all three levels: lexical, syntactic and semantic Lexicon A lexicon is formed by the set of terms that can be understood by the system; that is, the set of terms that have a special meaning in a given NLIDB. This particularly means the set of entities and properties that have been identified in the database schema. In the case of {AskMe*}, these terms are also captured in the domain ontology. In order to build the lexicon, we combine the set of nouns derived from the domain knowledge contained in the database, namely the entity and property names, with a general-knowledge vocabulary terms, mostly verbs, adjectives and adverbs. We are retrieving these general-knowledge vocabulary terms from WordNet [13],a large lexical database of English. This database classifies nouns, verbs, adjectives and adverbs into sets of cognitive synonyms. Cognitive synonyms (also named in WordNet as synsets [13]) are terms which belong to a different syntactic category (i.e. nouns, verbs, etc.) but represent related concepts; an example of a set of cognitive synonyms could be approximation (noun), approximated (adjective) and approximate (verb). Thanks to these cognitive synonyms sets, we are also able to complement the existing set of domain-specific nouns (entities and properties from the domain ontology) with an important amount of synonyms, into the system lexicon. This aspect is very important, as it will allow the lexer to automatically accept terms that, even when they are not the exact noun used in the underlying database schema, represent the same concept for the user. For example, the database may contain a property called Telephone for the entity Customer ; while the user probably refers to it simply as Phone. Thelexeris able to recognize phone as a valid term as well. In case the term is not in WordNet, such as ProductID, several heuristics are applied (e.g. by splitting a term into several terms when there is an uppercase letter in the middle of a term in lower case letters: Product+ID ). Finally, the user can review this lexicon in order to add or suppress synonyms (e.g. the term Emp is not in WordNet so the user could add synonyms such as Employee ). Following the example of Northwind described previously (see Fig. 2), the domain-specific lexicon of nouns built from the ontology (WordNet synonyms in parentheses) is presented in Table 2. Note that it contains Entities and Properties as specialized terms, this classification is not relevant to the lexicon itself, but will be used lately for semantic analysis as we will describe. By having the dynamic lexicon generation process, {AskMe*} can implement an interesting feature such as the lexical error detection

7 476 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Fig. 4. Generated custom XML tree containing all Northwind entities in the sample. capability. Once the system has been configured, a user can start typing in a query and it will be processed by the lexer at first. Every time that a white space has been added to the buffer, the lexer analyzes the term Fig. 5. OWL capturing classes and relations from Northwind. that goes immediately before this white space and decides whether it is valid from the lexical perspective or not. If the term does not appear in the lexicon, the lexer will tag it as an invalid lexical item. This tag information is automatically retrieved by the query-authoring services component, that will underline the invalid term with red squiggles in the query bar, making it evident to the user that the underlined part is wrong in his query, even before he finishes writing it and offering tooltip information about the invalid term (Fig. 6). In the example shown in Fig. 6, a query about projects is being provided by the user. The system analyzes this query in real time and determines that projects is the entity that needs to be found in the underlying domain. In order to do this, {AskMe*} looks for this entity on the lexicon (from Table 2) and determines that it does not exist. As a result of this, the system notifies the user about the error in the query by adding red squiggles as an underline to the term that has not been found in the lexicon. When the user places the mouse on this term, a tooltip containing additional information about the error is displayed to the user. The other query-authoring service offered by {AskMe*} at lexer level is the completion suggestions mechanism, which offers in a dropdown pop-up menu, which appears below the word that the user is currently typing, the set of suggested words that contain the portion typed by the user as a fragment. This helps the user to remember the exact word that he is trying to write, and also to autocomplete it, making him write queries faster (Fig. 7).

8 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Table 2 Northwind's lexicon with WordNet synonyms. Entities (and synonyms) Product (merchandise, ware) Order (command) Order details Categories (types, classes) Suppliers (dealers, providers, vendors) Properties Product ID (product identifier), product name (product denomination), supplier ID (provider identifier), category ID (type identifier, class identifier), quantity per unit, unit price (unit cost), units in stock, units in order, reorder level, discontinued. Order ID (order identifier), employee ID (worker identifier), order date (command date), required date (due date), shipped date, ship via, freight (cargo), ship name, ship address, ship city (town, municipality), ship region, ship postal code (ship zip code), ship country. Order ID (order identifier), product ID (product identifier, ware identifier, merchandise identifier), unit price (unit cost), quantity (amount), discount (deduction, reduction, allowance). Category ID (type identifier, class identifier), category name (category denomination, class name, class denomination, type denomination, type name), description (representation, information), picture (photo, photograph, image). Supplier ID (dealer identifier, provider identifier, vendor identifier), company name (business name, enterprise name), contact name (correspondent name), contact title (correspondent appellation), address (direction, domicile), city (town, municipality), region (territory, district), postal code (zip code), country (nation, state), phone (telephone), fax (facsimile), home page Syntactic parser {AskMe*} leverages the Link Grammar Parser [10] for the core syntactic parsing operations. The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a constituent representation of a sentence (showing noun phrases, verb phrases, etc.), like the one shown in Fig. 8. The parser has a dictionary of about 60,000 word forms. It has coverage of a wide variety of syntactic constructions, including many rare and idiomatic ones. The parser is robust; it is able to skip over portions of the sentence that it cannot understand, and assign some structure to the rest of the sentence. It is able to handle unknown vocabulary, and make intelligent guesses from context and spelling about the syntactic categories of unknown words. It has knowledge of capitalization, numerical expressions, and a variety of punctuation symbols. A full description of the Link Grammar is out of scope for this article, however it is noteworthy that, by using the Link Grammar API, the totality of this parser's capabilities can be leveraged in {AskMe*}, thus enabling our efforts to focus on other innovative areas such as the combination of query-authoring services within the proposed NLIDB, as well as the portability of the system. The concurrency mechanisms implemented on top of the Link Grammar Parser API are based on event notifications for all the syntactic parser events: every time the parser processes and tags a fragment of the input query an event is generated, containing information about the syntactic classification for each token. This is a key component for driving the syntactic query-authoring service that {AskMe*} implements: syntactic error squiggles (green). These squiggles warn the user about syntactic errors in a query, even before the query authoring has been fully completed (Fig. 9) Semantic parser The third parsing step performed to an input query is the semantic parsing. In {AskMe*}, given its dynamic domain-specific knowledge acquisition nature, it may be feasible to find that a certain query is valid according to the lexical and syntactic analysis, but does not represent a concept that fits into the current domain. For example, the query Name and date of the customers from the country where most orders were made in 2010 could be lexically and syntactically valid, all the terms in the sentence may be present in the dynamic lexicon, and the syntactic construction and order of words match one of the valid categories of phrases in the Link Grammar Parser. However, as you will notice, the concept of Date may not exist for the entity Customer. This is definitely an error in the input query, a semantic error. In order to detect this kind of errors, the semantic parsing step is applied to the input query. The semantic parser is guided by the use of semantic templates which are filled with the concepts captured in the domain ontology. The set of rules that are modeled by these dynamically-generated semantic templates are: - Entity Property correspondence: This rule enforces that all the requested properties for an entity in a query are indeed part of the current domain schema. - Cross-entities relationships: This rule is applied to queries that contain multiple sub-phrases, and its purpose is to enforce that there exists a foreign-key relationship in the database schema between the entities in the query. Fig. 6. Lexical error squiggles and tooltip error information. Fig. 8. Constituent tree for the query Suppliers that are not in United States. Fig. 7. Completions for supplier properties starting with Co. Fig. 9. Syntactic error squiggles and tooltip error information.

9 478 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Table 3 Examples of semantic rules behavior. Schema relationships Input query Result Reason Customers orders and orders products Products from customers whose last name is Llopis. Fail There is not an existing relationship between products and customers in this domain. Customers orders and orders products Products that were ordered by more than 100 customers in Success Products are related to orders, and every order references a customer. Table 4 Examples of template-based semantic error messages. Inconsistency type Entity property mismatch Missing relationship Error description message Entity A does not contain a property called Property A (where Entity A and Property A are the values in a query). Entity A and Entity B are not related to each other. 6. Evaluation In order to evaluate the effectiveness of our approach, we are applying three different experiments: - Accuracy in query interpretation for a concrete domain. - Effectiveness of query authoring services for a concrete domain. - Portability of the system across domains. - Entities' default attributes: There are cases in which the query is valid from a lexical, syntactic and semantic analysis, but it does not specify which attributes must be present in the result. For instance, the query in Table 3, Products that were ordered by more than 100 customers in 2010, does not specify which product properties we are interested in. This semantic rule does not invalidate a given input query, but rather imposes that the resulting SQL query must return all the product attributes that are not-null in the database schema, such as the Product ID, Product Name, Price, etc. This information, as we explained previously, was captured in the domain ontology as an OWL cardinality metadata attribute. Some examples of these rules are presented and analyzed in Table 3. In the case that one or more of these semantic requirements are not met by the input query, the semantic analysis would report errors. These errors are notified to the system in the form of events. The query-authoring services component is subscribed to these semantic events, in the same way as it is to the lexical and syntactic ones, and would therefore notify the user in a visual way about the issue, by highlighting the portions of the input query that cause the inconsistency. When the user hovers with the mouse over these highlighted regions, a tooltip containing a description of the inconsistency comes up. This description is also template-based, see Table Accuracy in query interpretation for a concrete domain The first experiment consists on evaluating the accuracy of our query interpretation process in a concrete domain. For that purpose, we evaluated our system using data from the Air Travel Information (ATIS) domain [15]. The ATIS database is based on air travel data obtained from tile Official Airline Guide (OAG) in June 1992 and current at that time. The database includes information for 46 cities and 52 airports in the US and Canada. The largest table in the expanded database, the flight table, includes information on 23,457 flights. A complete reference about the ATIS domain can be found at [15]. The selection of ATIS was motivated by three concerns. First, a large corpus of ATIS sentences already exists and is readily available. Second, ATIS provides an existing evaluation methodology, complete with independent training and test corpora, and scoring programs. Finally, evaluation on a common corpus makes it easy to compare the performance of the system with those based on different approaches. Our experiments utilized the 448 context independent questions in the ATIS Scoring Set A, which is one of the sets of questions of the ATIS benchmark, generally the most commonly used for the evaluation of other systems, and the one that lets us compare with most of them. {AskMe*} produced an accuracy rate of 94.8%. System accuracy rate is calculated based on the equation in Fig. 11. Fig. 10. Examples of queries from ATIS and results obtained with {AskMe*}. Fig. 11. System accuracy equation.

10 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Table 5 Accuracy comparison using ATIS between various NLIDB systems. HEY [16] SRI [17] PRECISE [3] {AskMe*} MIT [18 ] AT&T [19] Table 5 contains a comparison of the results obtained by {AskMe*} in the ATIS benchmark, to other state-of-the art systems. In some cases, as displayed in Fig. 10, some of the failures are due to domain-specific information or query shortcuts (such as tomorrow Date Time, etc.) which {AskMe*} does not support yet because other functional work was prioritized higher, such as domain-portability or query-authoring services. These results confirm that, even when {AskMe*} is a fully reconfigurable system that can be targeted to multiple knowledge domains, its accuracy results against a particular domain are very similar to the results for other state-of-the-art systems which are tailored to the underlying domain Effectiveness of query authoring services for a concrete domain The second experiment that we are using to evaluate our system is measuring how using query-authoring services improves the overall usability of the system, by enabling early detection of query errors. In order to do that, we asked a set of ten users to write fifty queries per user in a given domain. These users were completely new to the system and they did not have any previous knowledge about the underlying domain. We gave them an initial description of the Northwind database, without schema representation or concrete entity/property names, and let them query the system in an exploratory way. This description was as simple as explaining them that the database contained information about products, product categories, orders, order details and suppliers. During this process, users are very likely to introduce mistakes in most of the queries they come up with for the first time. We captured traces for all of these queries and recorded in which stage of the parsing process they were raised. Our results indicate that, from the set of fifty input queries per user, almost 90% of them contained errors, from which roughly the 80% of these wrong queries could be detected before they were translated into SQL and, therefore, before they were being executed against the database. This fact results in significant improvements in terms of latency time for wrong queries, since thanks to the query-authoring services that {AskMe*} implements, they are locally detected by the system instead of being translated into SQL and executed against the database. The results of this experiment show that while an important amount of errors (23%) are due to lexical errors (usually things like typos), and 26% of them correspond to syntactic errors (mostly ill-formed sentences in the English language), most of the errors are due to semantic errors (51%). In order to help minimizing the probability of having lexical errors in a query, the system provides auto-completion for entities and properties, and also auto-correction of typos based on distance-editing algorithms. Table 6 shows some of the most interesting queries written by users and how {AskMe*} guided them towards the right query. In terms of semantic error distribution classified by the main semantic rules that {AskMe*} implements, this evaluation determines that 51% of them fall in the rule of entity property mismatch, thus being the most common semantic error, 41% of errors correspond to queries trying to refer to a missing relationship that does not exist in the domain and the remaining 8% represents semantic errors due to the query specifying invalid values in property conditions Portability of the system across multiple domains Our third experiment focuses on evaluating the portability of the system. For this purpose, we have created a script that simulates the user actions through the visual interface. In this test, the system will be connected to three different databases that we have previously configured: ATIS, AdventureWorksDB [20] and Northwind [14]. For each of these database connections, a custom benchmark made up of fifty different queries that are relevant to the correspondent domains (ATIS as described in the first experiment, Northwind as described through different sections of this paper and Adventure Works shown in Fig. 12) is executed against the system, asserting that the queryauthoring services work as expected and that the resulting SQL query is generated as expected as well. Finally, the test also evaluates the behavior when the system is connected to a database that had been already connected before, checking that the ontology generation process is not kicked-off again, but rather the existing ontology for that source is pulled back from the store and brought into the current connection context. The results of this experiment indicate that there is not any lose in accuracy after a reconnection to a different database, and the results are the same as if the system was only connected to a single database for its lifetime. This means that the same results observed in the first and second experiments apply to the scenario of multiple database reconnections without degrading the overall accuracy of the system after connecting to multiple domains. 7. Conclusions and future work {AskMe*} is an adaptive natural language interface and environment system to query arbitrary databases. Internally, the system leverages an ontology-based approach in which a new ontology is auto-generated every time the system is connected to a different database. Once this ontology has been generated, the rest of the system domain-specific grammar, query-authoring services, etc. reconfigures itself based on the set of language terms and relationships contained in the ontology. This automatic reconfiguration enables an effective lexical, syntactic and semantic validation of an input query, which will result in a higher Table 6 Sample queries fixed by user interaction with query services. User query Corrected query How was it fixed? List all categories of products List all categories of products Auto-correction of typos and lexical level (distance-editing algorithm comparing to known valid tokens) Products from customers whose last name is Llopis Products ordered by customers whose last name is Llopis A semantic error tooltip is displayed in user query; they learn that the relation products customers is transitive, via order details and orders (which contain customer ID). The user follows guidance in order to end up with a valid query. Products from whose last name is Llopis Products from customers whose last name is Llopis The syntactic parser detects a syntactic error when the user types whose as the entity is missing. By providing an error squiggle and tooltip, the user is able to identify the missing piece on the query and correct it in order to fix and complete the rest of the query.

11 480 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) Fig. 12. Adventure Works database simplified schema used in the second experiment. accuracy of the system. The evaluation process showed how, despite the system is not specific to any concrete domain, the result of 94.8% of accuracy against the ATIS benchmark is relatively good compared to other existing state-of-the-art systems, both domain-dependent and independent. Furthermore, this approach enables full portability of the system without any reconfiguration steps needed for the system to successfully execute queries against any new database. Extra mapping reconfigurations, user preferred ways to refer to elements of the domain-model, can be done through easy user interface gestures such as right-clicking elements (i.e. words) of a given query. We believe that the simplification of the reconfiguration process when connecting to new database schemas is a very important step towards the democratization of NLIDBs in real world setup, as it enables non-technical users to be able to fully control the system through its entire lifecycle. In addition, it enables the construction of a customized textual query environment in which a set of query-authoring services can be provided to the user, to help authoring and disambiguating queries. These query-authoring services play a fundamental role in systems' usability, making it possible to early detect query errors, as demonstrated in the evaluation section, where we observed that around the 80% of the queries that contained errors could be detected before they were actually translated into SQL, resulting in a more efficient, lower-latency, user-interactive system. The classification of these errors based on the parsing stage in which they are detected, as shown in the evaluation, gives us the possibility to selectively focus on improving the quality and functionality of query-authoring services at each stage of the parsing process, in order to maximize the investment in relation to the gain of the overall user experience. Finally, just remark that {AskMe*} helps the user to make valid queries as well by automatically distinguishing between linguistic and conceptual failures. Based on our very positive evaluation results for early error detection, thanks to the use of query-authoring services, as future work, we are trying to maximize this benefit by experimenting with new query-authoring services and improving the existing ones. Moreover, we will add anaphora and ellipsis resolution capabilities in {AskMe*}. Anaphora and ellipsis resolution are an active research field in the space of NLIDBs; this capability enables users to have the possibility to dramatically abbreviate the number of words to be written when asking different questions about different aspects of the same entity, which will result, again, in another important usability shift for {AskMe*} [21]. The main drawback of using this kind of resolution is its low precision. However, we plan to overcome the low precision of anaphora and ellipsis resolution by means of benefiting of query authoring services. Acknowledgments This research has been partially funded by the Valencia Government under Project PROMETEO/2009/119, and by the Spanish Government under Project Textmess 2.0 (TIN C04-01) and TIN References [1] S. Abiteboul, V. Hull, R. Viannu, Foundations of Database Systems, Addison Wesley, [2] L. Androutsopoulos, Natural language interfaces to databases an introduction, Journal of Natural Language Engineering 1 (1995) [3] A. Popescu, A. Armanasu, O. Etzioni, D. Ko, A. Yates, PRECISE on ATIS: Semantic Tractability and experimental results, in: Proceedings of the National Conference on Artificial Intelligence AAAI, 2004, pp [4] N. Stratica, L. Kosseim, B.C. Desai, Using Semantic Templates for a natural language interface to the CINDI virtual library, Data & Knowledge Engineering Journal 55 (1) (2004) [5] D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition and Computational Linguistics, Prentice Hall, [6] C. Manning, H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, [7] H. Santoso, S. Haw, Z.T. Abdul-Mehdi, Ontology extraction from relational database: concept hierarchy as background knowledge, Knowledge-Based Systems 24 (3) (2011) [8] M. Watson, NLBean(tm) version 4: a natural language interface to databases, [9] R. Bartolini, C. Caracciolo, E. Giovanetti, A. Lenci, S. Marchi, V. Pirrelli, C. Renso, L. Spinsanti, Creation and use of lexicons and ontologies for NL interfaces to databases, in: Proceedings of the International Conference on Language Resources and Evaluation, vol. 1, 2006, pp [10] D. Sleator, D. Temperley, Parsing English with a link grammar, in: Proceedings of the Third International Workshop on Parsing Technologies, [11] B. Hu, H. Liu, J. He, X. Du, FARS: multi-relational feature and relation selection approach for efficient classification, in: Proceedings of the Advance Data Mining and Application Conference, vol. 1, 2008, pp

12 M. Llopis, A. Ferrández / Computer Standards & Interfaces 35 (2013) [12] V.I. Levenhstein, Binary Codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady 10 (8) (1966) [13] G.A. Miller, WordNet: a lexical database for English, Communications of the ACM Journal - CACM 38 (11) (1995) [14] Northwind, [15] M. Bates, S. Boisen, J. Makhoul, Developing an evaluation methodology for spoken language systems, in: Proceedings of the Speech and Natural Language Workshop, vol. 1, 1990, pp [16] H. Young, S. Young, A data-driven spoken language understanding system, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, vol. 1, 2003, pp [17] R.C. Moore, D.E. Appelt, SRI's experience with the ATIS evaluation, in: Proceedings of the Workshop on Speech and Natural Language, 1990, pp [18] V. Zue, J. Glass, D. Goddeau, D. Goodine, L. Hirschman, M. Phillips, J. Polifroni, S. Seneff, The MIT ATIS system: February 1992 Progress Report, in: Proceedings of the Workshop on Speech and Natural Language, 1992, pp [19] D. Hindle, An analogical parser for restricted domains, in: Proceedings of the Workshop on Speech and Natural Language, 1992, pp [20] Adventure Works, [21] J.L. Vicedo, A. Ferrandez, Importance of pronominal anaphora resolution in question answering systems, in: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000, pp [22] N. Nihalani, S. Silakari, M. Motwani, Natural language interface for database: a brief review, International Journal of Computer Science Issues 8 (2) (2011) [23] M. Llopis, A. Ferrandez, {AskMe*}: Reducing the costs of adoption, portability and learning process in a natural language interface to query databases, in: Proceedings of the 8th International Workshop on Natural Language Processing and Cognitive Science, vol. 1, 2011, pp [24] W.A. Woods, R.M. Kaplan, B.N. Webber, The Lunar Sciences Natural Language Information System: Final Report, in: BBN Report, 2378, [25] Y.W. Wong, Learning for semantic parsing using statistical machine translation techniques, in: Technical Report UT-AI , University of Texas, Austin, [26] Y. Li, H. Yang, H.V. Jagadish, NALIX: an interactive natural language interface for querying XML, in: Proceedings of the International Conference on Management of Data, 2005, pp [27] R. Valencia-Garcia, F. Garcia-Sanchez, D. Castellanos-Nieves, J.T. Fernandez-Breis, OWLPath: an OWL ontology-guided query editor, IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 41 (1) (2011) [28] D. Damljanovic, M. Agatonovic, H. Cunningham, Natural language interfaces to ontologies: combining syntactic analysis and ontology-based lookup through the user interaction, in: Proceedings of the 7th Extended Semantic Web Conference, 2010, pp [29] P.R. Devale, A. Deshpande, Probabilistic context free grammar: an approach to generic interactive natural language interfaces to databases, Journal of Information, Knowledge and Research in Computer Engineering 1 (2) (2010) [30] H.R. Tennant, K.M. Ross, M. Saenz, C.W. Thompson, J.R. Miller, Menu-based natural language understanding, in: Proceedings of the 21st Annual Meeting of ACL, 1983, pp Miguel Llopis is a Ph.D. Student at the Department of Software and Computing Systems in the University of Alicante (Spain). His research interests include: Natural Language Processing, Question Answering and Domain-Specific Languages. He has written various papers in journals and participated in international conferences related to his research topics. Besides his Ph.D. studies and research activity, Miguel works as a Program Manager in the SQL Server Team at Microsoft Corporation (Redmond, Washington). Contact him at mll9@alu.ua.es. Antonio Ferrández is a Full-time Lecturer at the Department of Software and Computing Systems in the University of Alicante (Spain). He obtained his Ph.D. in Computer Science from the University of Alicante (Spain). His research interests are: Natural Language Processing, Anaphora Resolution, Information Extraction, Information Retrieval and Question Answering. He has participated in numerous projects, agreements with private companies and public organizations related to his research topics. Finally, he has supervised Ph.D. Thesis and participated in many papers in Journals and Conferences related to their research interests. Contact him at antonio@dlsi.ua.es.

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces

More information

Pattern based approach for Natural Language Interface to Database

Pattern based approach for Natural Language Interface to Database RESEARCH ARTICLE OPEN ACCESS Pattern based approach for Natural Language Interface to Database Niket Choudhary*, Sonal Gore** *(Department of Computer Engineering, Pimpri-Chinchwad College of Engineering,

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql

Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Domain Knowledge Extracting in a Chinese Natural Language Interface to Databases: NChiql Xiaofeng Meng 1,2, Yong Zhou 1, and Shan Wang 1 1 College of Information, Renmin University of China, Beijing 100872

More information

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

Semantic Stored Procedures Programming Environment and performance analysis

Semantic Stored Procedures Programming Environment and performance analysis Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2

S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2 Model-Based Architecture for Building Natural Language Interface to Oracle Database S. Aquter Babu 1 Dr. C. Lokanatha Reddy 2 1 Assistant Professor, Dept. of Computer Science, Dravidian University, Kuppam,

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Answers to Review Questions

Answers to Review Questions Tutorial 2 The Database Design Life Cycle Reference: MONASH UNIVERSITY AUSTRALIA Faculty of Information Technology FIT1004 Database Rob, P. & Coronel, C. Database Systems: Design, Implementation & Management,

More information

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 2, Jun 2013, 161-166 TJPRC Pvt. Ltd. NATURAL LANGUAGE TO SQL CONVERSION

More information

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY

ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY ORACLE ENTERPRISE DATA QUALITY PRODUCT FAMILY The Oracle Enterprise Data Quality family of products helps organizations achieve maximum value from their business critical applications by delivering fit

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer. DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It

More information

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014.

International Journal of Advance Foundation and Research in Science and Engineering (IJAFRSE) Volume 1, Issue 1, June 2014. A Comprehensive Study of Natural Language Interface To Database Rajender Kumar*, Manish Kumar. NIT Kurukshetra rajenderk18@gmail.com *, itsmanishsidhu@gmail.com A B S T R A C T Persons with no knowledge

More information

TopBraid Insight for Life Sciences

TopBraid Insight for Life Sciences TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

Information extraction from online XML-encoded documents

Information extraction from online XML-encoded documents Information extraction from online XML-encoded documents From: AAAI Technical Report WS-98-14. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Patricia Lutsky ArborText, Inc. 1000

More information

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

Paraphrasing controlled English texts

Paraphrasing controlled English texts Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich kaljurand@gmail.com Abstract. We discuss paraphrasing controlled English texts, by defining

More information

Developing Physical Solutions for InfoSphere Master Data Management Server Advanced Edition v11. MDM Workbench Development Tutorial

Developing Physical Solutions for InfoSphere Master Data Management Server Advanced Edition v11. MDM Workbench Development Tutorial Developing Physical Solutions for InfoSphere Master Data Management Server Advanced Edition v11 MDM Workbench Development Tutorial John Beaven/UK/IBM 2013 Page 1 Contents Overview Machine Requirements

More information

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability Ana-Maria Popescu Alex Armanasu Oren Etzioni University of Washington David Ko {amp, alexarm, etzioni,

More information

Natural Language Updates to Databases through Dialogue

Natural Language Updates to Databases through Dialogue Natural Language Updates to Databases through Dialogue Michael Minock Department of Computing Science Umeå University, Sweden Abstract. This paper reopens the long dormant topic of natural language updates

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1 Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically

More information

Relational Database Basics Review

Relational Database Basics Review Relational Database Basics Review IT 4153 Advanced Database J.G. Zheng Spring 2012 Overview Database approach Database system Relational model Database development 2 File Processing Approaches Based on

More information

Security Development Tool for Microsoft Dynamics AX 2012 WHITEPAPER

Security Development Tool for Microsoft Dynamics AX 2012 WHITEPAPER Security Development Tool for Microsoft Dynamics AX 2012 WHITEPAPER Junction Solutions documentation 2012 All material contained in this documentation is proprietary and confidential to Junction Solutions,

More information

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database I.J. Intelligent Systems and Applications, 2013, 12, 81-90 Published Online November 2013 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2013.12.07 Semantic Analysis of Natural Language Queries

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India

Department of Computer Science and Engineering, Kurukshetra Institute of Technology &Management, Haryana, India Volume 5, Issue 4, 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Natural

More information

Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014

Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014 Oracle Siebel Marketing and Oracle B2B Cross- Channel Marketing Integration Guide ORACLE WHITE PAPER AUGUST 2014 Disclaimer The following is intended to outline our general product direction. It is intended

More information

Programmabilty. Programmability in Microsoft Dynamics AX 2009. Microsoft Dynamics AX 2009. White Paper

Programmabilty. Programmability in Microsoft Dynamics AX 2009. Microsoft Dynamics AX 2009. White Paper Programmabilty Microsoft Dynamics AX 2009 Programmability in Microsoft Dynamics AX 2009 White Paper December 2008 Contents Introduction... 4 Scenarios... 4 The Presentation Layer... 4 Business Intelligence

More information

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,

More information

Essential Visual Studio Team System

Essential Visual Studio Team System Essential Visual Studio Team System Introduction This course helps software development teams successfully deliver complex software solutions with Microsoft Visual Studio Team System (VSTS). Discover how

More information

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager

Rational Reporting. Module 3: IBM Rational Insight and IBM Cognos Data Manager Rational Reporting Module 3: IBM Rational Insight and IBM Cognos Data Manager 1 Copyright IBM Corporation 2012 What s next? Module 1: RRDI and IBM Rational Insight Introduction Module 2: IBM Rational Insight

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence ICS461 Fall 2010 1 Lecture #12B More Representations Outline Logics Rules Frames Nancy E. Reed nreed@hawaii.edu 2 Representation Agents deal with knowledge (data) Facts (believe

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Natural Language Query Processing for Relational Database using EFFCN Algorithm

Natural Language Query Processing for Relational Database using EFFCN Algorithm International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-02 E-ISSN: 2347-2693 Natural Language Query Processing for Relational Database using EFFCN Algorithm

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

A generic approach for data integration using RDF, OWL and XML

A generic approach for data integration using RDF, OWL and XML A generic approach for data integration using RDF, OWL and XML Miguel A. Macias-Garcia, Victor J. Sosa-Sosa, and Ivan Lopez-Arevalo Laboratory of Information Technology (LTI) CINVESTAV-TAMAULIPAS Km 6

More information

Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach

Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach Reusable Knowledge-based Components for Building Software Applications: A Knowledge Modelling Approach Martin Molina, Jose L. Sierra, Jose Cuena Department of Artificial Intelligence, Technical University

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

D6 INFORMATION SYSTEMS DEVELOPMENT. SOLUTIONS & MARKING SCHEME. June 2013

D6 INFORMATION SYSTEMS DEVELOPMENT. SOLUTIONS & MARKING SCHEME. June 2013 D6 INFORMATION SYSTEMS DEVELOPMENT. SOLUTIONS & MARKING SCHEME. June 2013 The purpose of these questions is to establish that the students understand the basic ideas that underpin the course. The answers

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Lesson 8: Introduction to Databases E-R Data Modeling

Lesson 8: Introduction to Databases E-R Data Modeling Lesson 8: Introduction to Databases E-R Data Modeling Contents Introduction to Databases Abstraction, Schemas, and Views Data Models Database Management System (DBMS) Components Entity Relationship Data

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

Introduction to XML Applications

Introduction to XML Applications EMC White Paper Introduction to XML Applications Umair Nauman Abstract: This document provides an overview of XML Applications. This is not a comprehensive guide to XML Applications and is intended for

More information

Textual Modeling Languages

Textual Modeling Languages Textual Modeling Languages Slides 4-31 and 38-40 of this lecture are reused from the Model Engineering course at TU Vienna with the kind permission of Prof. Gerti Kappel (head of the Business Informatics

More information

(Refer Slide Time 00:56)

(Refer Slide Time 00:56) Software Engineering Prof.N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture-12 Data Modelling- ER diagrams, Mapping to relational model (Part -II) We will continue

More information

Test Data Management Concepts

Test Data Management Concepts Test Data Management Concepts BIZDATAX IS AN EKOBIT BRAND Executive Summary Test Data Management (TDM), as a part of the quality assurance (QA) process is more than ever in the focus among IT organizations

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

A Controlled Natural Language for Business Intelligence Monitoring

A Controlled Natural Language for Business Intelligence Monitoring A Controlled Natural Language for Business Intelligence Monitoring Christian Colombo 1, Jean-Paul Grech 1, and Gordon J. Pace 1 University of Malta {christian.colombo jean-paul.grech.11 gordon.pace}@um.edu.mt

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Application of ontologies for the integration of network monitoring platforms

Application of ontologies for the integration of network monitoring platforms Application of ontologies for the integration of network monitoring platforms Jorge E. López de Vergara, Javier Aracil, Jesús Martínez, Alfredo Salvador, José Alberto Hernández Networking Research Group,

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

WebSphere Business Monitor

WebSphere Business Monitor WebSphere Business Monitor Administration This presentation will show you the functions in the administrative console for WebSphere Business Monitor. WBPM_Monitor_Administration.ppt Page 1 of 21 Goals

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

CHAPTER 6: TECHNOLOGY

CHAPTER 6: TECHNOLOGY Chapter 6: Technology CHAPTER 6: TECHNOLOGY Objectives Introduction The objectives are: Review the system architecture of Microsoft Dynamics AX 2012. Describe the options for making development changes

More information

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,

More information

Natural Language Web Interface for Database (NLWIDB)

Natural Language Web Interface for Database (NLWIDB) Rukshan Alexander (1), Prashanthi Rukshan (2) and Sinnathamby Mahesan (3) Natural Language Web Interface for Database (NLWIDB) (1) Faculty of Business Studies, Vavuniya Campus, University of Jaffna, Park

More information

NLUI Server User s Guide

NLUI Server User s Guide By Vadim Berman Monday, 19 March 2012 Overview NLUI (Natural Language User Interface) Server is designed to run scripted applications driven by natural language interaction. Just like a web server application

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

CONCEPTCLASSIFIER FOR SHAREPOINT

CONCEPTCLASSIFIER FOR SHAREPOINT CONCEPTCLASSIFIER FOR SHAREPOINT PRODUCT OVERVIEW The only SharePoint 2007 and 2010 solution that delivers automatic conceptual metadata generation, auto-classification and powerful taxonomy tools running

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed

More information

PTC Integrity Eclipse and IBM Rational Development Platform Guide

PTC Integrity Eclipse and IBM Rational Development Platform Guide PTC Integrity Eclipse and IBM Rational Development Platform Guide The PTC Integrity integration with Eclipse Platform and the IBM Rational Software Development Platform series allows you to access Integrity

More information

Chapter 11 Mining Databases on the Web

Chapter 11 Mining Databases on the Web Chapter 11 Mining bases on the Web INTRODUCTION While Chapters 9 and 10 provided an overview of Web data mining, this chapter discusses aspects of mining the databases on the Web. Essentially, we use the

More information

Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet

Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet SUMMARY Dimitris Kotzinos 1, Poulicos Prastacos 2 1 Department of Computer Science, University of Crete

More information

IBM WebSphere Operational Decision Management Improve business outcomes with real-time, intelligent decision automation

IBM WebSphere Operational Decision Management Improve business outcomes with real-time, intelligent decision automation Solution Brief IBM WebSphere Operational Decision Management Improve business outcomes with real-time, intelligent decision automation Highlights Simplify decision governance and visibility with a unified

More information

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.

More information

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

Solution Documentation for Custom Development

Solution Documentation for Custom Development Version: 1.0 August 2008 Solution Documentation for Custom Development Active Global Support SAP AG 2008 SAP AGS SAP Standard Solution Documentation for Custom Page 1 of 53 1 MANAGEMENT SUMMARY... 4 2

More information

An Ontology-based e-learning System for Network Security

An Ontology-based e-learning System for Network Security An Ontology-based e-learning System for Network Security Yoshihito Takahashi, Tomomi Abiko, Eriko Negishi Sendai National College of Technology a0432@ccedu.sendai-ct.ac.jp Goichi Itabashi Graduate School

More information

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers Technology White Paper JStatCom Engineering, www.jstatcom.com by Markus Krätzig, June 4, 2007 Abstract JStatCom is a software framework

More information

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents

More information

Ontology for Home Energy Management Domain

Ontology for Home Energy Management Domain Ontology for Home Energy Management Domain Nazaraf Shah 1,, Kuo-Ming Chao 1, 1 Faculty of Engineering and Computing Coventry University, Coventry, UK {nazaraf.shah, k.chao}@coventry.ac.uk Abstract. This

More information

Logi Ad Hoc Reporting Report Design Guide

Logi Ad Hoc Reporting Report Design Guide Logi Ad Hoc Reporting Report Design Guide Version 11.2 Last Updated: March, 2014 Page 2 Table of Contents INTRODUCTION... 4 What is Logi Ad Hoc Reporting?... 5 CHAPTER 1 Getting Started... 6 Learning the

More information

Ontology and automatic code generation on modeling and simulation

Ontology and automatic code generation on modeling and simulation Ontology and automatic code generation on modeling and simulation Youcef Gheraibia Computing Department University Md Messadia Souk Ahras, 41000, Algeria youcef.gheraibia@gmail.com Abdelhabib Bourouis

More information

Structured Content: the Key to Agile. Web Experience Management. Introduction

Structured Content: the Key to Agile. Web Experience Management. Introduction Structured Content: the Key to Agile CONTENTS Introduction....................... 1 Structured Content Defined...2 Structured Content is Intelligent...2 Structured Content and Customer Experience...3 Structured

More information

Using EMC Documentum with Adobe LiveCycle ES

Using EMC Documentum with Adobe LiveCycle ES Technical Guide Using EMC Documentum with Adobe LiveCycle ES Table of contents 1 Deployment 3 Managing LiveCycle ES development assets in Documentum 5 Developing LiveCycle applications with contents in

More information

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture

A prototype infrastructure for D Spin Services based on a flexible multilayer architecture A prototype infrastructure for D Spin Services based on a flexible multilayer architecture Volker Boehlke 1,, 1 NLP Group, Department of Computer Science, University of Leipzig, Johanisgasse 26, 04103

More information

BizTalk Server 2006. Business Activity Monitoring. Microsoft Corporation Published: April 2005. Abstract

BizTalk Server 2006. Business Activity Monitoring. Microsoft Corporation Published: April 2005. Abstract BizTalk Server 2006 Business Activity Monitoring Microsoft Corporation Published: April 2005 Abstract This paper provides a detailed description of two new Business Activity Monitoring (BAM) features in

More information

ISTQB Certified Tester. Foundation Level. Sample Exam 1

ISTQB Certified Tester. Foundation Level. Sample Exam 1 ISTQB Certified Tester Foundation Level Version 2015 American Copyright Notice This document may be copied in its entirety, or extracts made, if the source is acknowledged. #1 When test cases are designed

More information

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management)

How to Improve Database Connectivity With the Data Tools Platform. John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management) How to Improve Database Connectivity With the Data Tools Platform John Graham (Sybase Data Tooling) Brian Payton (IBM Information Management) 1 Agenda DTP Overview Creating a Driver Template Creating a

More information

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That

More information