Data Extraction from Structured Databases using Keyword-based Queries

Size: px
Start display at page:

Download "Data Extraction from Structured Databases using Keyword-based Queries"

Transcription

1 paper:84 Data Extraction from Structured Databases using Keyword-based Queries Mariana Soller Ramada, João Carlos da Silva, Plínio de Sá Leitão-Júnior 1 Instituto de Informática Universidade Federal de Goiás (UFG) Caixa Postal Goiânia GO Brazil {mariana,jcs,plinio}@inf.ufg.br Abstract. Relational databases are used to store a large quantity of data scattered around the world. However, users face difficulties in accessing such data for lack of a more natural way of specifying queries. Techniques that use natural language words to search different information sources on the Web are now very common, but they cannot be employed to search relational databases. This work proposes a method that allows user to submit keyword-based queries, which are then semantically analysed and enriched before being mapped for the database language. By analysing keyword-based queries, the method considers different factors, e.g. the proximity between keywords, query segmentation, and the use of aggregate functions. 1. Introduction The use of keywords for retrieving information consists of a simple search technique. The last decade has witnessed the growing use of this technique, which has in fact become a standard for user interaction with the World Wide Web (WWW). However, it cannot be applied to all storage media, like relational databases, for instance, which store a vast amount of valuable information. Querying in relational databases requires prior knowledge of storage structures and of the syntax of a structured language, such as SQL. However, the majority of users do not have such knowledge, which limits access to the stored data. In the last few years, great efforts have been made in research and development activities to extend the abilities of keyword-based search to data sources that follow the relational paradigm. Nevertheless, the existing techniques reveal some drawbacks. The first drawback is that, according to such approaches, every keyword plays a given role in the database and each keyword is mapped out for a corresponding database structure. Consider the schema of the relation Employee (Id, Name, Address, Salary, Super id, Department id). The query higher salary expects a statistic, the higher salary, as a result, not a set of interconnected tuples containing the keywords higher and salary. The second drawback is that the query is segmented in a way that each keyword represents a single role in the database. Referring once again to the relation Employee, a possible interpretation for the query employee Houston Tx is the employee who lives at the address Houston Tx. Therefore, the keywords Houston and Tx are expected to be mapped together for the attribute Address of the table Employee, instead of being mapped separately for a database structure. The third drawback concerns the fact that several studies fail to consider the interdependence between keywords. Even though a query is made up of a simple list of keywords, the meaning of each keyword is not independent from the 57

2 meaning of the others; together, they all represent the concepts intended by the user when creating a query. This paper focuses on the semantic approach to keyword queries. The drawbacks listed above are considered and solutions are provided and implemented. A new keywordbased search method for relational databases was defined. For a given keyword query, the method proposed converts it into corresponding SQL queries, all of which are submitted to the underlying database. An SQL query is a structured query in which tables, attributes, and their conditions are accurately specified, whereas a keyword query comprises imprecise terms that express the user s need for information. Therefore, this study introduces semantics to this conversion process, to provide a clearer idea of the meaning intended by the query and to construct SQL expressions that represent the user s real intent, returning results in order of relevance. The Keymantic-based approach [Bergamaschi et al. 2010] was chosen as a starting point for meeting the aims of this study, in view of the fact that it deals partially with the third drawback previously mentioned. The remainder of this paper is organized as follows. Section 2 presents related works. Section 3 shows the architecture of the keyword query method proposed, as well as the way it operates. Section 4 explains how the Keymantic system works, and Sections 5 and 6 reveal the modifications proposed and their impacts. Section 7 provides some conclusions. 2. Related Works The literature reveals two main approaches to keyword queries in relational databases: one based on Candidate Networks and another based on Steiner Trees. Both conceive a database as a network of interconnected tuples and focus on detecting tuples that contain the keywords of a given query. Query processing returns connected components based on the way these tuples are associated. DBXplorer [Agrawal et al. 2002] and DIS- COVER [Hristidis and Papakonstantinou 2002] implement the Candidate Networks approach, whereas BANKS [Aditya et al. 2002] applies the Steiner Tree approach. All of these systems pose the three drawbacks mentioned in Section 1. To solve the second drawback, FRISK system [Pu and Yu 2009] uses a dynamic programming algorithm to compute the query s best segmentations and then present them to the user, who in turn chooses the one that better suits his/her intent. Keymantic [Bergamaschi et al. 2010] and Keyword++ [Ganti et al. 2010] solve the third drawback by taking into account the query s ambiguity and seeking the completeness and accuracy of results. Completeness consists of returning all relevant results, whereas accuracy refers to the most relevant results which match user intent. A query may return more than one result, in which case it becomes necessary to order them according to their relevance. Ranking functions assign scores to each result and then classify results according to these scores. Some researchers have employed Data Retrieval metrics for calculating ranking functions, such as Luo et al. [Luo et al. 2008] and Hristidis et al. [Hristidis et al. 2003]. In DISCOVER [Hristidis and Papakonstantinou 2002] and DBXplorer [Agrawal et al. 2002], results are ranked by simple methods, e.g. based on the number of joins. In Labrador [Mesquita et al. 2007] a ranking is computed using a Bayesian network model. As for 58

3 BANKS [Aditya et al. 2002], calculating result scores takes into account the edge weight of the data graph. 3. Architecture The method proposed converts a keyword query into corresponding SQL queries, which are submitted to the underlying database. As a result to a keyword query, the method returns the results obtained by running the SQL queries, listed in order of relevance. Figure 1 shows an overview of the method s architecture. Figure 1. Architecture of the method proposed 3.1. Preprocessing This stage is responsible for identifying keywords in the query that do not provide a direct meaning in the database structure, but other forms of semantics, e.g. the use of aggregate functions and data sorting. In addition, this stage determine s the query s best segmentation regarding the values comprising more than one keyword Verification of Aggregate Function/Sorting This stage allows the identification of keywords which represent the intention of using aggregate functions and sorting. Identifying keywords which suggest the use of aggregate functions and sorting is performed through a list of reserved words, which are basically predefined words based on the way users produce their queries. Seven groups were created and a set of keywords was defined for each of them. Each set consists of synonyms obtained by integrating a thesaurus as a semantic resource. Listed below are the groups and their respective sets: maximum = synonym(higher) minimum = synonym(lower) mean = synonym(average) sum = synonym(total) count = synonym(quantity) grouping = synonym(for each) order = synonym(sorted) 59

4 After identifying which keyword suggests the use of an aggregate function or sorting, it is necessary to establish which attribute the function will be applied on. Based on the fact that users create queries in which related words are close to one another, once a keyword that suggests the use of an aggregate function or sorting is identified, the function it suggests is applied to the term represented by the keyword that immediately follows it Query Segmentation In general, search engines support the use of delimiters to group multiple words to a single concept. Many concepts are represented by a phrase rather than by a single word. Google and Yahoo! are examples of search engines that allow syntax for phrase searching. Identifying keywords which must be regarded collectively to form an attribute value is performed by the use of single inverted commas Query Processing Once a keyword query has been submitted, it is necessary to create several SQL queries that will be run in the underlying database. This process is carried out by mapping the keywords for the database terms. Once the SQL queries have been run, results are returned to the user in order of relevance Mapping Returning a keyword query in relational databases requires understanding the meaning of each keyword and the construction of an SQL query that provides a coherent interpretation of the original query. It is necessary to map the keywords for the database structures, e.g. relations, attributes, and attribute values. Several techniques and tools have been proposed to solve the problem of keyword queries in relational databases, as was pointed out in Section 2. However, most of these proposals fail to consider the many interpretations which a keyword query may pose. The mapping process proposed in this paper is based on the semantic analysis implemented by Keymantic, which explores the relative positions (order) of keywords within the query, as well as the database schema and other auxiliary external sources. The Keymantic approach, which grounds the present work, is described in Section Execution and Ranking of Results Each SQL query generated in the previous stage is now run in the corresponding database, and the results of these queries represent the results expected for a keyword query. The results produced are not equally significant, as some of them represent the semantics intended by the query more effectively. In this regard, it is interesting to generate a ranking of results. Considering that the Keymantic system computes a score for each generated SQL query, the results returned to the user may then be listed firstly by the score of their corresponding SQL query, and secondly by the size of the join path, based on the number of joins required to create the SQL query. 60

5 4. Semantic Query Analysis Keymantic explores the relative positions of query keywords together with external sources, to produce a more accurate assumption of the semantics represented by the query. This statement is grounded on the fact that the meaning of each keyword is not independent from the meaning of the others; all of them collectively represent the concepts the user had in mind when creating the query. Moreover, not all keywords represent instance values. Many are used as metadata of adjacent keywords. The mapping process performed by Keymantic comprises five stages. A special data structure, known as weight matrix, is used during this process. The value of a cell represents the weight related to the mapping performed between keyword and database term. Two sub-matrices may be distinguished in the weight matrix. The first, called SW, corresponds to the database terms related to schema elements, i.e. relations and attributes. The second, called V W, corresponds to attribute values, i.e. elements which belong to attribute domains. The first step is Intrinsic Weight Computation. The relevance between each query keyword and each database term is calculated by exploring and combining a number of similarity techniques. In the next step, Selection of Best Mappings to Schema Terms, a serie of mappings is generated based on the intrinsic weights of sub-matrix SW. Each mapping associates a certain number of keywords to the database schema terms. Keywords that remain unmapped are considered at a later stage during value term mapping. Only mappings that reach the highest score are selected. In the third step, Contextualization of VW and Selection of Best Mappings to Value Terms, the unmapped keywords are now mapped based on each partial mapping generated in the previous stage. Then a total mapping of keywords for the database terms form a configuration at step Generation of the Configurations. The configuration s score is the sum of weights in the weight matrix of elements [i, j], where i is a keyword and j is the database term to which the keyword was mapped. Finally, once the best configurations have been computed, the interpretations of the keyword query, i.e. SQL queries, may be generated at step Generation of the Interpretations. The score of each SQL query is that of its respective configuration. A configuration is simply a mapping of the keywords to database terms. The presence of different join paths between these terms leads to multiple interpretations. 5. Contributions to Semantic Analysis Keymantic is grounded on the assumption that every keyword plays a role in a query, i.e. each keyword represents a term within the database. As each keyword must represent a database term, it is not possible to interpret queries which suggest the use of aggregate functions or sorting. Even if a query has, for instance, the keyword higher, the system will map it to a database term instead of interpreting it as an indicator of the use of the aggregate function max. Additionally, Keymantic regards the mapping process as an injective function, in which there is no image element that shows correspondence with more than one domain element, the domain being the set of keywords and the image, the set of database terms. In other words, two keywords cannot be mapped to the same database term. However, as mentioned in Section 1, it is possible for a value to be composed by more than one 61

6 keyword. In this sense, it is necessary to map all the keywords which compose this value for the same attribute domain. Even though Keymantic computes weights which refer to each interpretation, it does not provide a classification of the obtained results. The paper which describes this tool suggests that a classification based on each interpretation s score and on the size of the join path may be carried out. To make feasible the mapping of query keywords for aggregate function, sorting or composite values, as well as to allow sorting results, the method proposed in this paper implements the functionalities which refer to the modules Verification of Aggregate Function/Sorting, Query Segmentation, and Ranking of Results shown in Figure 1, all of which are external to Keymantic. In addition, some modifications within Keymantic, particularly during Mapping, were also performed to yield greater quality to the returned results. Such modifications regard the stages Intrinsic Weight Computation of Schema Database Terms, Intrinsic Weight Computation of Value Database Terms, and the contextualization process. Use of Synonyms as Keywords During Intrinsic Weight Computation of Schema Database Terms, Keymantic employs a series of techniques that measure similarity between keywords and database terms, selecting the one which produces the best result. One of these techniques is string similarity. For string similarity Keymantic employs a number of different similarity metrics such as Jaccard, Hamming, Levenshtein, etc. The tool also assesses the relationship between a given keyword and a database schema term based on their semantic relationship, and for that uses ontologies and dictionaries. For each measuring technique (string similarity, semantic relationship, etc.), the similarity between a keyword and a database term is calculated in a 0-1 interval. The highest value returned is then multiplied by 100 and selected as intrinsic weight. If none of the values returned is higher than a predefined threshold, then the weight is set to 0. The method here proposed regards, at first, only the string similarity technique. Three similarity metrics are used for this technique: Jaccard, Levenshtein, and Cosine. Each metric has two strings as input and returns the similarity between them in a 0-1 interval. Computing the intrinsic weight of a keyword and a database term is based on the average of similarities returned for each metric. If the average is greater than the threshold, it is multiplied by 100 and selected as intrinsic weight. If not, the similarity between the synonyms of the keyword and the database term will be calculated. That involves using the WordNet 1 dictionary to return the synonyms of each query keyword. The average of similarities returned for each metric is computed for each synonym found. The greater average value returned will be taken into account. If this value is greater than the threshold, the average is multiplied by 90 and selected as intrinsic weight. If it is lower than the threshold, then the intrinsic weight is set to 0. Because users normally expect keywords from the query to appear in the results, the reason for considering the use of a thesaurus at a later stage is that it is necessary to give greater relevance to keyword queries. In other words, the existence of the keyword in the database must have greater weight than the existence of its synonym

7 Weight normalization for sub-matrix V W The modification applied to the step Intrinsic Weight Computation of Value Database Terms is performed because computing value term weights is regarded in binary form. Therefore, whereas the weights of sub-matrix SW - which refer to schema terms - are found in a interval, the weights of sub-matrix V W are 0 or 1. Thus, given that the weights of matrix V W are binary, the value attributed to the total configuration score by the keywords mapped to value terms is minimum. That being said, it is necessary to compare the value of value term intrinsic weights with that of schema term weights. Hence, after calculating the intrinsic weights for the value terms in the method proposed, weights that have value 1 are multiplied by a predefined constant. Proximity between keywords The last modification performed within Keymantic took place during contextualization. Keymantic takes into account the interdependence between query keywords, in which mapping a given keyword may increase or reduce the probability that another keyword, as yet unmapped, corresponds to a given database term. In addition to considering the interdependences between keywords, the method proposed takes into account their relative positions in the query based on their proximity. The farther a mapped keyword is from an unmapped one, the smaller its influence over this word s mapping process. This fact led to the following modification during contextualization: instead of adding a constant to intrinsic weights, the value added to the weight is proportional to the distance between keywords. 6. Comparison with Keymantic This section describes the impacts caused to the set of results presented to the user after applying modifications to the method proposed in Keymantic. Internal modifications were performed to attribute further semantics to the mapping process, in order to generate a smaller but more relevant set of interpretations in view of user intent. Precision and MRR metrics were used for this analysis. Using a company database, both systems were implemented and compared based on the results returned by each after running a set of keyword queries. Table 1 shows the set of keyword queries selected from a query pool and its user intended semantics. Table 1. Set of keyword queries Keyword Queries Intended Semantics 1 department List of information regarding the company s departments 2 department employee List of each department s employees 3 department project List of each department s projects 4 department project newbenefits Information of the department responsible for the Newbenefits project 5 employee dependent daughter Information regarding employees who have a daughter as a dependent 6 project name employee John Name of the project for which employee John works 7 employee address project 1 Address of employee who works in project 1 8 hours employee works project productz Number of working hours of each employee in project ProductZ 9 project location employee salary Location of project for which the employee who earns a salary works 63

8 Figure 2 compares and constrasts the number of configurations and interpretations (SQL queries) generated by both systems. Values are shown for each of the queries, whose identification is the same as that used in Table 1. Internal modifications, performed during the first three stages of Keymantic s mapping process, influenced the results obtained by our method, leading to lower values compare with Keymantic as regards the number of configurations and interpretations. This reduction has a major impact on the cost of the process and allows the user to deal with less results for his/her query. (a) Number of configurations generated by both systems (b) Number of interpretations generated by both systems Figure 2. Number of configurations and interpretations generated by both systems The graphs in Figure 3 explore queries with three or more keywords, as well as present each system s relative accuracy. This metric reveals the relevant fraction of the obtained interpretations, hence its status as a quality indicator commonly described in the literature. As shows Subfigure 3(a), the method proposed was more effective, showing that despite the reduction in the number of interpretations generated, the resulting set is more relevant to the query. Subfigure 3(b) exhibits accuracy values which refer to query size, i.e. the number of keywords comprising the query. Accuracy decreases in both systems as the number of query keywords increases. However, results reveal, once more, better accuracy values when compared with the original method. (a) (b) Figure 3. Accuracy metric for both systems As regards the external stage Ranking of Results, Keymantic is known to show results according to the order in which they are generated, with no guarantee that they will be presented in a descending score order. 64

9 The metric Mean Reciprocal Rank (MRR) was used to assess the stage Ranking of Results. This metric measures how near the top the first relevant result ranks within the set of results. Figure 4 shows the MRR for both systems based on query length. In Keymantic, the longer the query, the farther from the top of the ranking appears the first relevant result. In the method proposed, even though the first result becomes distant from the top as the query becomes longer, it remains one of the top-ranked results. Figure 4. MRR for both systems (inspired in [Fakhraee and Fotouhi 2012]) Threats to Validity Three limitations were identified with regard to the results obtained with the proposed method. The first limitation is related to the cardinality of the set of queries from which the results were obtained. A more extensive set of queries could lead to a higher confidence level when comparing results with Keymantic. The second limitation is related to how the queries were obtained, which are designed for the analysis of the proposed method instead of queries obtained from real systems. The last limitation is related to the fact that the proposed method was compared solely with Keymantic. A stronger argument could be made in favor of the proposed approach if its performance were to be compared with other existing state-of-the-art tools that allow searching relational databases using keyword queries. 7. Conclusion This paper proposed a method for querying relational databases with keywords to simplify access to these data, given the fact that such queries use natural language words. This method considered factors such as query segmentation, aggregate functions and sorting, as well as the user s intended semantics when creating the query. The semantic analysis of keyword queries was based on the approach provided by the Keymantic tool. Other resources were added to Keymantic s original proposal, such as the possibility of dealing with queries using aggregate functions and values comprising more than one keyword. To offer new querying possibilities, the method proposed implemented new external functionalities and performed internal improvements to the tool. During the experiments, it became clear that the internal modifications promoted a smaller and more significant set of results for the keyword query submitted, whereas the external modifications allowed the specification of queries that had up until then not 65

10 been considered, such as those that employed aggregate functions, sorting, and values comprising more than one keyword. Along the course of this research, we identified some aspects that could be complemented on to further discussion on this topic. Among them is the use of ontologies. During the process of intrinsic weight computation of schema terms, Keymantic uses ontologies, in addition to the synonyms obtained by WordNet, hence adding greater semantics to these stages. Another aspect to be considered is more effective query segmentation. In the method proposed, composite values are identified via the use of single inverted commas, which requires the user s a priori knowledge to construct the query in a suitable way. References Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, and Sudarshanxe, S. (2002). Banks: Browsing and keyword searching in relational databases. In VLDB 02: Proceedings of the 28th Intl. Conference on Very Large Databases, pages Morgan Kaufmann, San Francisco. Agrawal, S., Chaudhuri, S., and Das, G. (2002). Dbxplorer: a system for keyword-based search over relational databases. In Data Engineering, Proceedings. 18th Intl. Conference on, pages Bergamaschi, S., Domnori, E., Guerra, F., Orsini, M., Lado, R. T., and Velegrakis, Y. (2010). Keymantic: semantic keyword-based searching in data integration systems. Proc. VLDB Endow., 3: Fakhraee, S. and Fotouhi, F. (2012). Dbsemsxplorer: semantic-based keyword search system over relational databases for knowledge discovery. In Proceedings of the Third Intl. Workshop on Keyword Search on Structured Data, KEYS 12, pages 54 62, New York, NY, USA. ACM. Ganti, V., He, Y., and Xin, D. (2010). Keyword++: a framework to improve keyword search over entity databases. Proc. VLDB Endow., 3: Hristidis, V., Gravano, L., and Papakonstantinou, Y. (2003). Efficient ir-style keyword search over relational databases. In Proceedings of the 29th Intl. conference on Very large data bases - Volume 29, VLDB 2003, pages Hristidis, V. and Papakonstantinou, Y. (2002). Discover: Keyword search in relational databases. In Bernstein, P. A., Ioannidis, Y. E., Ramakrishnan, R., and Papadias, D., editors, VLDB 02: Proceedings of the 28th Intl. Conference on Very Large Databases, pages Morgan Kaufmann, San Francisco. Luo, Y., Wang, W., and Lin, X. (2008). Spark: A keyword search engine on relational databases. In Data Engineering, ICDE IEEE 24th Intl. Conference on, pages Mesquita, F., da Silva, A. S., de Moura, E. S., Calado, P., and Laender, A. H. F. (2007). Labrador: Efficiently publishing relational databases on the web by using keywordbased query interfaces. Inf. Process. Manage., 43(4): Pu, K. and Yu, X. (2009). Frisk: Keyword query cleaning and processing in action. In Data Engineering, ICDE 09. IEEE 25th Intl. Conference on, pages

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

Keyword-based Search in Data Integration Systems

Keyword-based Search in Data Integration Systems Keyword-based Search in Data Integration Systems Sonia Bergamaschi 1, Elton Domnori 1, Francesco Guerra 1, Raquel Trillo Lado 2, and Yannis Velegrakis 3 1 Università di Modena e Reggio Emilia, via Università

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables

Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables Fuzzy Multi-Join and Top-K Query Model for Search-As-You-Type in Multiple Tables 1 M.Naveena, 2 S.Sangeetha 1 M.E-CSE, 2 AP-CSE V.S.B. Engineering College, Karur, Tamilnadu, India. 1 naveenaskrn@gmail.com,

More information

An Enhanced Search Interface for Information Discovery from Digital Libraries

An Enhanced Search Interface for Information Discovery from Digital Libraries An Enhanced Search Interface for Information Discovery from Digital Libraries Georgia Koutrika 1, * and Alkis Simitsis 2, ** 1 University of Athens, Department of Computer Science, Athens, Greece koutrika@di.uoa.gr

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

Answering Structured Queries on Unstructured Data

Answering Structured Queries on Unstructured Data Answering Structured Queries on Unstructured Data Jing Liu University of Washington Seattle, WA 9895 liujing@cs.washington.edu Xin Dong University of Washington Seattle, WA 9895 lunadong @cs.washington.edu

More information

Keyword Search over Relational Databases: A Metadata Approach

Keyword Search over Relational Databases: A Metadata Approach Keyword Search over Relational Databases: A Metadata Approach Sonia Bergamaschi University of Modena and Reggio Emilia, Italy sonia.bergamaschi@unimore.it Raquel Trillo Lado University of Zaragoza, Spain

More information

A Comparative Approach to Search Engine Ranking Strategies

A Comparative Approach to Search Engine Ranking Strategies 26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Information Discovery on Electronic Medical Records

Information Discovery on Electronic Medical Records Information Discovery on Electronic Medical Records Vagelis Hristidis, Fernando Farfán, Redmond P. Burke, MD Anthony F. Rossi, MD Jeffrey A. White, FIU FIU Miami Children s Hospital Miami Children s Hospital

More information

Type Ahead Search in Database using SQL

Type Ahead Search in Database using SQL Type Ahead Search in Database using SQL Salunke Shrikant Dadasaheb Dattakala Group of Institutions, Faculty of Engineering University of Pune shrikantsalunke25@gmail.com Prof. Bere Sachin Sukhadeo Dattakala

More information

Mining Association Rules: A Database Perspective

Mining Association Rules: A Database Perspective IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology

More information

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Abstract Effective website personalization is at the heart of many e-commerce applications. To ensure that customers

More information

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China

The 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human

More information

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS

CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS 66 CHAPTER 5 INTELLIGENT TECHNIQUES TO PREVENT SQL INJECTION ATTACKS 5.1 INTRODUCTION In this research work, two new techniques have been proposed for addressing the problem of SQL injection attacks, one

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Information Discovery on Electronic Medical Records 1

Information Discovery on Electronic Medical Records 1 Information Discovery on Electronic Medical Records 1 Vagelis Hristidis* Fernando Farfán* Redmond P. Burke + Anthony F. Rossi + Jeffrey A. White *School of Computing and Information Sciences, Florida International

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge

Multi-Algorithm Ontology Mapping with Automatic Weight Assignment and Background Knowledge Multi-Algorithm Mapping with Automatic Weight Assignment and Background Knowledge Shailendra Singh and Yu-N Cheah School of Computer Sciences Universiti Sains Malaysia 11800 USM Penang, Malaysia shai14@gmail.com,

More information

Ontology-Based Meta-model for Storage and Retrieval of Software Components

Ontology-Based Meta-model for Storage and Retrieval of Software Components OntologyBased Metamodel for Storage and Retrieval of Software Components Cristiane A. Yaguinuma Department of Computer Science Federal University of São Carlos (UFSCar) P.O. Box 676 13565905 São Carlos

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

A Searching Strategy to Adopt Multi-Join Queries

A Searching Strategy to Adopt Multi-Join Queries A Searching Strategy to Adopt Multi-Join Queries Based on Top-K Query Model 1 M.Naveena, 2 S.Sangeetha, 1 M.E-CSE, 2 AP-CSE V.S.B. Engineering College, Karur, Tamilnadu, India. 1 naveenaskrn@gmail.com,

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

A Novel Framework For Enhancing Keyword Query Search Over Database

A Novel Framework For Enhancing Keyword Query Search Over Database International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-4 E-ISSN: 2347-2693 A Novel Framework For Enhancing Keyword Query Search Over Database Priya Pujari

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY

DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY The content of those documents are the exclusive property of REVER. The aim of those documents is to provide information and should, in no case,

More information

Using Provenance to Improve Workflow Design

Using Provenance to Improve Workflow Design Using Provenance to Improve Workflow Design Frederico T. de Oliveira, Leonardo Murta, Claudia Werner, Marta Mattoso COPPE/ Computer Science Department Federal University of Rio de Janeiro (UFRJ) {ftoliveira,

More information

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment 2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Determining Preferences from Semantic Metadata in OLAP Reporting Tool

Determining Preferences from Semantic Metadata in OLAP Reporting Tool Determining Preferences from Semantic Metadata in OLAP Reporting Tool Darja Solodovnikova, Natalija Kozmina Faculty of Computing, University of Latvia, Riga LV-586, Latvia {darja.solodovnikova, natalija.kozmina}@lu.lv

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

DYNAMIC QUERY FORMS WITH NoSQL

DYNAMIC QUERY FORMS WITH NoSQL IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

A QoS-Aware Web Service Selection Based on Clustering

A QoS-Aware Web Service Selection Based on Clustering International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

More information

Efficient Query Optimizing System for Searching Using Data Mining Technique

Efficient Query Optimizing System for Searching Using Data Mining Technique Vol.1, Issue.2, pp-347-351 ISSN: 2249-6645 Efficient Query Optimizing System for Searching Using Data Mining Technique Velmurugan.N Vijayaraj.A Assistant Professor, Department of MCA, Associate Professor,

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

Universal. Event. Product. Computer. 1 warehouse.

Universal. Event. Product. Computer. 1 warehouse. Dynamic multi-dimensional models for text warehouses Maria Zamr Bleyberg, Karthik Ganesh Computing and Information Sciences Department Kansas State University, Manhattan, KS, 66506 Abstract In this paper,

More information

RETRATOS: Requirement Traceability Tool Support

RETRATOS: Requirement Traceability Tool Support RETRATOS: Requirement Traceability Tool Support Gilberto Cysneiros Filho 1, Maria Lencastre 2, Adriana Rodrigues 2, Carla Schuenemann 3 1 Universidade Federal Rural de Pernambuco, Recife, Brazil g.cysneiros@gmail.com

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Data Mining for Data Cloud and Compute Cloud

Data Mining for Data Cloud and Compute Cloud Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,

More information

A Search Engine to Categorize LOs

A Search Engine to Categorize LOs A Search Engine to Categorize LOs Gianni Fenu Abstract The main purpose of this paper is to analyze the state of the art of search engines in e-learning platforms, and to elaborate a new model that exploits

More information

DBease: Making Databases User friendly and Easily Accessible

DBease: Making Databases User friendly and Easily Accessible DBease: Making Databases User friendly and Easily Accessible Guoliang Li Ju Fan Hao Wu Jiannan Wang Jianhua Feng Department of Computer Science, Tsinghua University, Beijing 184, China {liguoliang, fengjh}@tsinghua.edu.cn;

More information

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS

INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS INTEGRATION OF XML DATA IN PEER-TO-PEER E-COMMERCE APPLICATIONS Tadeusz Pankowski 1,2 1 Institute of Control and Information Engineering Poznan University of Technology Pl. M.S.-Curie 5, 60-965 Poznan

More information

Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description)

Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description) Caravela: Semantic Content Management with Automatic Information Integration and Categorization (System Description) David Aumueller, Erhard Rahm University of Leipzig {david, rahm}@informatik.uni-leipzig.de

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609. Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity

More information

Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK 1 K. LALITHA, 2 M. KEERTHANA, 3 G. KALPANA, 4 S.T. SHWETHA, 5 M. GEETHA 1 Assistant Professor, Information Technology, Panimalar Engineering College,

More information

DataXFormer: An Interactive Data Transformation Tool

DataXFormer: An Interactive Data Transformation Tool DataXFormer: An Interactive Data Transformation Tool John Morcos 1 Ziawasch Abedjan 2 Ihab F. Ilyas 1 Mourad Ouzzani 3 Paolo Papotti 3 Michael Stonebraker 2 1 University of Waterloo 2 MIT CSAIL 3 Qatar

More information

Deep Web Entity Monitoring

Deep Web Entity Monitoring Deep Web Entity Monitoring Mohammadreza Khelghati s.m.khelghati@utwente.nl Djoerd Hiemstra d.hiemstra@utwente.nl Categories and Subject Descriptors H3 [INFORMATION STORAGE AND RETRIEVAL]: [Information

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

Content Delivery Network (CDN) and P2P Model

Content Delivery Network (CDN) and P2P Model A multi-agent algorithm to improve content management in CDN networks Agostino Forestiero, forestiero@icar.cnr.it Carlo Mastroianni, mastroianni@icar.cnr.it ICAR-CNR Institute for High Performance Computing

More information

Report on the Dagstuhl Seminar Data Quality on the Web

Report on the Dagstuhl Seminar Data Quality on the Web Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,

More information

Using NLP and Ontologies for Notary Document Management Systems

Using NLP and Ontologies for Notary Document Management Systems Outline Using NLP and Ontologies for Notary Document Management Systems Flora Amato, Antonino Mazzeo, Antonio Penta and Antonio Picariello Dipartimento di Informatica e Sistemistica Universitá di Napoli

More information

Visual Structure Analysis of Flow Charts in Patent Images

Visual Structure Analysis of Flow Charts in Patent Images Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Keyword Search in Graphs: Finding r-cliques

Keyword Search in Graphs: Finding r-cliques Keyword Search in Graphs: Finding r-cliques Mehdi Kargar and Aijun An Department of Computer Science and Engineering York University, Toronto, Canada {kargar,aan}@cse.yorku.ca ABSTRACT Keyword search over

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

SPARQL Query Recommendations by Example

SPARQL Query Recommendations by Example SPARQL Query Recommendations by Example Carlo Allocca, Alessandro Adamou, Mathieu d Aquin, and Enrico Motta Knowledge Media Institute, The Open University, UK, {carlo.allocca,alessandro.adamou,mathieu.daquin,enrico.motta}@open.ac.uk

More information

Distributed Database for Environmental Data Integration

Distributed Database for Environmental Data Integration Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information

More information

The Ontological Approach for SIEM Data Repository

The Ontological Approach for SIEM Data Repository The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian

More information

Lesson 8: Introduction to Databases E-R Data Modeling

Lesson 8: Introduction to Databases E-R Data Modeling Lesson 8: Introduction to Databases E-R Data Modeling Contents Introduction to Databases Abstraction, Schemas, and Views Data Models Database Management System (DBMS) Components Entity Relationship Data

More information

Effective Keyword-based Selection of Relational Databases

Effective Keyword-based Selection of Relational Databases Effective Keyword-based Selection of Relational Databases Bei Yu National University of Singapore Guoliang Li Tsinghua University Anthony K. H. Tung National University of Singapore Karen Sollins MIT ABSTRACT

More information

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence

More information

MULTI AGENT-BASED DISTRIBUTED DATA MINING

MULTI AGENT-BASED DISTRIBUTED DATA MINING MULTI AGENT-BASED DISTRIBUTED DATA MINING REECHA B. PRAJAPATI 1, SUMITRA MENARIA 2 Department of Computer Science and Engineering, Parul Institute of Technology, Gujarat Technology University Abstract:

More information

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation

Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation Shaghayegh Sahebi and Peter Brusilovsky Intelligent Systems Program University

More information

PREPROCESSING OF WEB LOGS

PREPROCESSING OF WEB LOGS PREPROCESSING OF WEB LOGS Ms. Dipa Dixit Lecturer Fr.CRIT, Vashi Abstract-Today s real world databases are highly susceptible to noisy, missing and inconsistent data due to their typically huge size data

More information

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them

An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,

More information

Supporting Ontology-based Keyword Search over Medical Databases

Supporting Ontology-based Keyword Search over Medical Databases Supporting Ontology-based Keyword Search over Medical Databases Anastasios Kementsietsidis, Ph.D. Lipyeow Lim, Ph.D. Min Wang, Ph.D. IBM T.J. Watson Research Center, Skyline Drive, Hawthorne, NY, USA.

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Automatic Annotation Wrapper Generation and Mining Web Database Search Result Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India

More information

ONTOLOGY IN ASSOCIATION RULES PRE-PROCESSING AND POST-PROCESSING

ONTOLOGY IN ASSOCIATION RULES PRE-PROCESSING AND POST-PROCESSING ONTOLOGY IN ASSOCIATION RULES PRE-PROCESSING AND POST-PROCESSING Inhauma Neves Ferraz, Ana Cristina Bicharra Garcia Computer Science Department Universidade Federal Fluminense Rua Passo da Pátria 156 Bloco

More information