Searching and surfing the web using a semi-adaptive meta-engine
|
|
- Ashley Cole
- 7 years ago
- Views:
Transcription
1 Searching and surfing the web using a semi-adaptive meta-engine A. Castellucci, G. Ianni DEIS, Università della Calabria, Rende (CS), Italy tony73@writeme.com, ianni@deis.unical.it D. Vasile Pitagora S.p.A., Rende (CS), Italy vasile@pitagora.it S. Costa CM Sistemi Sud S.r.l Cosenza, Italy sebastiano.costa@gruppocm.it Abstract Global Search 1 is a web agent which integrates and enhances many well known search techniques in order to improve the quality of information gathered from usual web search engines. It features intelligent merging of relevant documents from different search engines, anticipated adaptive exploration and evaluation of links from the current result set, automated derivation of refined queries based on user relevance feedback. 1. Introduction The recent explosive growth of the World Wide Web focused the attention of a wide range of users on the hardness of the information retrieval over the Internet. The usual workaround to this problem is the adoption of huge web indexes which can be queried by keyword-based user questions, like the well-known Altavista, Lycos, Google. Unfortunately, no existing index can track successfully all the existing web pages, in spite of many recent brute force attempts such as the Inktomi indexing project [12]. Moreover, the document selection technique, adopted by each search engine, is often very arbitrary and heterogeneous [4]. The merging of documents found by different search engines enhances the overall web coverage and the quality of documents found: many automatic collection techniques from different search engines are known, such as the ones shown in MetaCrawler, Profusion, Inquirus, SavvySearch [19, 8, 14, 4] and the one from the recent commercial experience of Copernic [3]. Moreover, many studies considered a) the possibility of agent-based autonomous search, in order to pursue various purposes [16, 13], and b) the possibility of involve the user in order to learn knowledge from its own preferences about the pages found [18]. 1 The design of this prototype (also called Good Stuff Agent) was fully sponsored by C.I.E.S., Centro di Ingegneria Economica e Sociale, P.te P.Bucci, Rende (CS), Italy. Global Search Agent (GSA in the following) is a standalone application which should be installed within the Internet-ready machine of the user. From the application, the user can specify its requests as usual, as a set of keywords. GSA queries a relevant set of search engines, collects and ranks results from them; the user can browse documents as soon as they are displayed, while the system searches for other results, browses and ranks links adjacent to the initial ones. Moreover, the user can a) classify the queries made in a structured concept tree: the tree structure is employed by the agent to limit the retrieved documents to a restricted subset of them, those expected to be within the tree; and b) express an opinion about each document found: these preferences are employed by GSA in order to find more keywords which can improve the overall document attitude with the user s meanings. Searches can be scheduled, configured w.r.t. many parameters (set of search engines queried, ranking technique, duration etc.), and delegated to a remote instance of the agent, which can push back the results found when the search is done. 2. Meta-searching Many problems arise when we try to successfully merge results from heterogenous search sources. First, the right way to query each search engine is very different from one another. Second, results are sent back to the user in a semistructured form (usually an HTML page): an ad hoc parser is then needed for each different search source the agent may desire to query. Each parser acts as an independent entity and supplies the main application with a new result (given in an engine-independent form) when a new document is parsed. Differently from [8, 10] and [4], GSA does not attempt to merge results using heuristic techniques intended to deal with the unknown ranking functions of each search engine. In fact, this approach did not prove to be useful in order to provide a suitable sorting of documents found; moreover it
2 would force GSA to gather all the results before a single document could be displayed. Thus, parsers do not provide relevance values; the ranking and merging step are deferred to the following. 3. Adaptive exploration When the main agent is prompted for a new, potentially relevant, result, a new entity, called spider is created. A spider retrieves the document on which it is started on, establishes its ranking, and decides if it is worth to pursue the task of exploring the links following from the current document. In this case a new child spider is started, one for each link found. This approach sacrifices efficiency (each document must be retrieved) but provides effective removal of not well ranked and/or not reachable documents. The two main question arising here are a) how to rank a document, and b) how to automatically select interesting links. In order to attribute a ranking value to a document we chose a ranking function based on the one proposed by [14]. This function embeds three components: a) a presence component, which value is proportional to the presence of almost one occurrence of a given terms within the page text, b) a frequency component, which weights the overall quantity of occurrences for a given term within a page, and c) a distance component, which weights the overall distance between occurrences of the given terms. The ranking function takes the document to be scored and a set of given keywords, and evaluates as follows (We denote as the cardinality of the set ): where!"# $ # %&'(&) *( +, # -./ )( The value is the sum of presence value of each term. The presence is the maximum similarity found for a given term within the text considered. The similarity is introduced in order to consider the stem of each word: differently from [7], we chose a stemming algorithm independent from the language which the text is supposed to be written in. Usually the similarity of a term with another one is 0 when the two terms are identical: a couple of words with the same stem give similarity very near to 0 (e.g. the first one can be considered a significant occurrence of the second one) whereas words with low similarity w.r.t. the set W, are cut off. The value of is the total sum of significant occurrences of the words of ; each significant occurrence is weighted by the corresponding similarity value. 1 is the number of words of with a significant value of presence; 2 3 represents the minimum distance 2 found 3 between two significant occurrences of the words and ; and are two constants controlling the shape of, whereas are suitable chosen weights for each of the three components, and is the maximum distance (in words) to be considered significant for two occurrences of terms in. At the moment, these values can be set from the user to desired values. Differently from [14], our ranking function a) embeds directly some stemming techniques, b) expresses distances in words and not in characters and, c) is naturally bounded within a given range (in fact, ranges from 4 to the asymptotic value ). This eliminates the need of scaling the rank values at the end of the search and the need of knowing a priori the number of documents retrieved, providing a sort of ideal document whose relevance value tends to the right edge (i.e. ) of the allowed score interval. For what the spider behaviour is concerned, the idea is near the approach of Letizia, and Webwatcher [16, 13], but the purposes are pretty different. Each spider takes into account the list of the ranking values scored by the last documents visited, and the concept subtree, which the originating query belongs to. Each tree node carries a concept name and some concept keywords (which are decided by user intervention, at the moment). These values are employed to compute a happiness function based on the last 5 documents visited. In particular: %: where each term ; % is the combined score a considered document received. Given a set of keywords <, representing a query over the web, and the sets of keywords < === <>, representing the ancestor concepts for <, the combined score ; for a given document is ; A is ; % ':EE> BCD 3 such that F < ' 3 A G HI and A and H are two fixed parameters. This function is similar to the average score of the last documents visited: when a document scores a value too low, the spider tries to score it using the concept keywords of the antecedent node
3 of the originating query, and so on, until the root node is reached or a worth score is reached. However, these further rankings steps have a lower weight when the overall happiness is computed. When the happiness of a spider becomes as low as the given threshold value, a spider dies: else, if the maximum depth allowed is not yet reached (i.e. the maximum number of documents a spider can explore independently from its happiness), the spider creates a child spider (which inherits the status of the father spider) for each link within the current document. If a link points to an unknown and/or unwanted information source (such as binary files) it is automatically discarded. The search goes on until there is a spider alive: the higher the happiness of a spider is the higher its execution priority is. WWW MetaSearch Unranked URL GSA Architecture Spider User Ranked URL Scheduler FeedBack Remote Control 4. Learning from user preferences Following [18], the user can express a boolean preference (e.g. hot document, cold document) on each document retrieved, or ignore some of them; then he can ask GSA to take care of his preferences. GSA parses hot and cold documents and extract a set of good terms and a set of bad terms (the latter is not considered in the current release). We chose not to adopt traditional Bayesian clustering methods [15, 18]: in order to be effective, such techniques annoy the user, requiring to classify very huge sets of documents. Thus, we preferred a good heuristic technique, which showed very interesting performances, mainly with smaller sets of documents. In order to extract a suitable set of good terms, GSA compiles a ranking of suitable terms and outputs the terms with best scoring. A set of stopwords [7], containing very common English and Italian terms, is a priori excluded from the ranking (obviously, this does not prevent the user from manually entering a stop word within his own search). Let be the set of good documents and be the set of bad documents; let be the set of words of the originating query: the score of each term is obtained by a relevance function : % 9 % %: 9 % %: BCD ): / 9 ) ': ' 0 where % is the number of occurrences of within the document 2, and ) is the minimum distance (in words) between a significant occurrence of the term ) and the term. Each term increments its score if it appears in a good document and decrements its score if it appears in a Figure 1. The GSA executing environment bad document. Further occurrences beyond the first one of a term in a document does non alter too much the value of. 5. System Architecture We describe GSA architecture with an example. The system starts its activity when a query is entered either from the user, from the scheduler (which manages a list of previously arranged queries), or from a remote instance of GSA, prompting for a search. Assume the entered keywords are Luna Rossa. An additional starting URL can be given to the system, e.g. GSA activates The Spider (SE in the following) and the Metasearch (ME in the following). The two subsystems works in parallel: in this case, the former will start a spider in order to explore and rank the latter will query all the available search engines using the keywords Luna Rossa. ME extracts single results from search engines as soon as they are available, and prompts SE in order to start a spider on each extracted document. SE manages spiders: each spider parses an URL, ranks it, decides if the URL relevance is enough in order to display the corresponding page, and decides if it is worth to deploy further spiders on the neighborhood of the considered URL. The search terminates by user intervention or when ME and SE have no further documents to analyze (i.e. no more spiders can be generated), but the user can analyze results while the system is still performing the search. The Feedback (FE in the following), works offline. User can specify his/her opinion on which are interesting and uninteresting documents, marking accordingly entries of the document list. Once the user opinion is given (even on a small subset of the overall set of documents re-
4 # Documents # Relevant # Relevance Found documents rate GSA % Altavista % Excite % Google % Hotbot % Table 1. The results of GSA against some other well known search engine. Query # Relevance Average Duration rate Score 1 minute 70.0% minutes 100.0% % 878 Table 2. The results of GSA on long duration queries (best score=1000). 7. Further Search issues trieved), FE can be started. The output of FE is a set of relevant words, suggested to the user in order to refine the search. Suppose the user is interested in a competition involving the boat Luna Rossa; then, he/she marks documents found accordingly (e.g. he/she marks all the documents not related to sailing as non-relevant). In this case, the words which GSA outputs are america, cup, Experimental results Table 1 reports some result on a set of 20 short term queries containing keywords pertaining to different domains. For each query we considered the ten most relevant documents reported from each search engine. The table indicates, for each search engine, the number of documents found, the number of documents really considered as relevant, and the percentage of relevant documents w.r.t. documents found. Results from GSA were computed halting evaluation after 30 seconds. The connection speed was about 2Mbit/sec. Costants chosen for the scoring function were Usually, GSA performs fast and very well against single search engines when short duration searches are submitted; the overhead taken by the task of directly retrieving each document is far balanced when five or more search engines are queried simultaneously (tests were performed doing meta-search on Altavista, Excite, FastSearch, Google, Hotbot, Lycos, FastSearch, Yahoo, and Webcrawler [1, 5, 6, 9, 11, 6, 21, 20]). Nonetheless, GSA reveals itself very useful when long duration (e.g. overnight) searches are planned. Table 2 resumes typical relevance rate and average score of the first ten documents retrieved, when the same search is halted after 1 minute, 5 minutes or never (in the latter case GSA halts after a time depending on the initial happiness of spiders). The quality of documents retrieved increases significantly on long term queries, whereas the relevance rate becomes maximum very soon. At the moment we are studying several improvements to the architecture of GSA, like the automatic generation of engine-dependent parsers [17], the improvement of spiders behaviour introducing improved happiness functions and some cooperative information exchange between them. Moreover, we think the system could be improved introducing an automated concept tree derivation like in [22], and providing an automated parameter tuning [2]. Nonetheless, we should complete the learning user preference method with a better stemmed parsing, and introducing some sort of clustering between the terms found. References [1] Altavista web site, [2] B. T. Bartell, G. w. Cottrell, and R. K. Belew. Optimizing parameters in a ranked retrieval system using multi-query relevance feedback. Proc. of the Symposium on Document Analysis and Information Retrieval, Las Vegas, [3] Copernic web site, [4] D. Dreinlinger and A. E. Howe. Savvysearch: A metasearch engine that learns which search engines to query. AI Magazine, 18(2):19 25, [5] Excite web site, [6] Fastsearch web site, [7] W. Frakes and e. R. Baeza-Yates. Information Retrieval: Data structures and algorithms. Prentice-Hall, [8] S. Gauch, G. Wang, and M. Gomez. Profusion: intelligent fusion from multiple, distributed search engines. Journal of Universal Computes Science, 2(9), [9] Google web site, [10] L. Gravano and H. G. Molina. Merging ranks from heterogeneous internet sources. Proc. of the 23rd VLDB Conference, Athens, Greece, [11] Hotbot web site, [12] Inktomi web site, [13] T. Joachim, D. Freitag, and T. Mitchell. Webwatcher: A tour guide for the world wide web. Proc. of the 15th Int. Joint Conf. on Artificial Intelligence, Nagoya, Japan, pages , 1997.
5 [14] T. Joachim, D. Freitag, and T. Mitchell. Inquirus, the NECI meta search engine. Proc. of the Seventh International World Wide Web Conference, Brisbane, Australia, pages , [15] E. J. Keogh and M. J. Pazzani. Learning augmented bayesian classifiers: a comparison of distribution-based and classification-based approaches. Uncertainty 99: The Seventh International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale FL, USA, [16] H. Lieberman. Letizia: An agent that assists web browsing. Proc. of the 14th Int. Joint Conf. on Artificial Intelligence, IJCAI 95, Montréal, Québec, Canada, pages , [17] S. Nestorov, S. Abiteboul, and R. Motwani. Inferring structure in semistructured data. SIGMOD Record, 26(1):54 66, March [18] M. Pazzani, J. Muramatsu, and D. Billsus. Syskill & Webert: Identifying interesting web sites. Proc. of the 30th Nat. Conf. on Artificial Intelligence, AAAI 96, pages 54 61, [19] E. Selberg and O. Etzioni. The metacrawler architecture for resource aggregation on the web. IEEE Expert, [20] Webcrawler web site, [21] Yahoo web site, [22] S. Yamada and Y. Osawa. Planning to guide concept understanding in the WWW. AAAI Workshop on AI and Information Integration, pages , 1998.
Acquisition of User Profile for Domain Specific Personalized Access 1
Acquisition of User Profile for Domain Specific Personalized Access 1 Plaban Kumar Bhowmick, Samiran Sarkar, Sudeshna Sarkar, Anupam Basu Department of Computer Science & Engineering, Indian Institute
More informationMetasearch Engines. Synonyms Federated search engine
etasearch Engines WEIYI ENG Department of Computer Science, State University of New York at Binghamton, Binghamton, NY 13902, USA Synonyms Federated search engine Definition etasearch is to utilize multiple
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationWeb Data Management - Some Issues
Web Data Management - Some Issues Properties of Web Data Lack of a schema Data is at best semi-structured Missing data, additional attributes, similar data but not identical Volatility Changes frequently
More informationChapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
More informationWEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
More informationAmerican Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationQDquaderni. UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti. university of milano bicocca
A01 084/01 university of milano bicocca QDquaderni department of informatics, systems and communication UP-DRES User Profiling for a Dynamic REcommendation System E. Messina, D. Toscani, F. Archetti research
More informationOptimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
More informationMeeting Scheduling with Multi Agent Systems: Design and Implementation
Proceedings of the 6th WSEAS Int. Conf. on Software Engineering, Parallel and Distributed Systems, Corfu Island, Greece, February 16-19, 2007 92 Meeting Scheduling with Multi Agent Systems: Design and
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationPerformance evaluation of Web Information Retrieval Systems and its application to e-business
Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,
More informationIntelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt>
Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.
More informationData Discovery on the Information Highway
Data Discovery on the Information Highway Susan Gauch Introduction Information overload on the Web Many possible search engines Need intelligent help to select best information sources customize results
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationRemote support for lab activities in educational institutions
Remote support for lab activities in educational institutions Marco Mari 1, Agostino Poggi 1, Michele Tomaiuolo 1 1 Università di Parma, Dipartimento di Ingegneria dell'informazione 43100 Parma Italy {poggi,mari,tomamic}@ce.unipr.it,
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationPersonalization of Web Search With Protected Privacy
Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information
More informationMonitoring Web Information using PBD Technique
Monitoring Web information using PBD technique. Tan, B., Foo. S., & Hui, S.C. (2001). Proc. 2nd International Conference on Internet Computing (IC 2001), Las Vegas, USA. June, 25 28, 666-672. Abstract
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationXML DATA INTEGRATION SYSTEM
XML DATA INTEGRATION SYSTEM Abdelsalam Almarimi The Higher Institute of Electronics Engineering Baniwalid, Libya Belgasem_2000@Yahoo.com ABSRACT This paper describes a proposal for a system for XML data
More informationA Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
More informationHow Search Engines Work
How Search Engines Work By Danny Sullivan, Editor October 14, 2002 The term "search engine" is often used generically to describe both crawler-based search engines and human-powered directories. These
More informationI Want To Start A Business : Getting Recommendation on Starting New Businesses Based on Yelp Data
I Want To Start A Business : Getting Recommendation on Starting New Businesses Based on Yelp Data Project Final Report Rajkumar, Balaji Ambresh balaji.ambresh@nym.hush.com (05929421) Ghiyasian, Bahareh
More informationWebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques
From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques Howard J. Hamilton, Xuewei Wang, and Y.Y. Yao
More informationFast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
More informationMake search become the internal function of Internet
Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,
More informationA Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
More informationThree types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.
Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationSEARCH ENGINE BASICS- THE SEARCH HELPER Randy Abdallah, Arts/Technology Specialist
Google search basics: Basic search help Search is simple: just type whatever comes to mind in the search box, hit Enter or click on the Google Search button, and Google will search the web for pages that
More informationACTIVITY THEORY (AT) REVIEW
ACTIVITY THEORY IN ACTION Brian Tran, CS 260 ACTIVITY THEORY (AT) REVIEW Activities are key structure in AT Composed of subjects, tools, and objective Ex. Bob (subject) is using the weights and treadmills
More informationDynamic Adaptive Feedback of Load Balancing Strategy
Journal of Information & Computational Science 8: 10 (2011) 1901 1908 Available at http://www.joics.com Dynamic Adaptive Feedback of Load Balancing Strategy Hongbin Wang a,b, Zhiyi Fang a,, Shuang Cui
More informationInverted files and dynamic signature files for optimisation of Web directories
s and dynamic signature files for optimisation of Web directories Fidel Cacheda, Angel Viña Department of Information and Communication Technologies Facultad de Informática, University of A Coruña Campus
More informationInvited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK THOREG@MICROSOFT.COM JOAQUINC@MICROSOFT.COM
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationA Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
More informationRANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
More informationGOAL-BASED INTELLIGENT AGENTS
International Journal of Information Technology, Vol. 9 No. 1 GOAL-BASED INTELLIGENT AGENTS Zhiqi Shen, Robert Gay and Xuehong Tao ICIS, School of EEE, Nanyang Technological University, Singapore 639798
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationWebWatcher: A Tour Guide for the World Wide Web. Dayne Freitag. Carnegie Mellon University. in intelligent agents.
WebWatcher: A Tour Guide for the World Wide Web Thorsten Joachims Universitat Dortmund Informatik-LS8 Baroper Str. 301 44221 Dortmund, Germany Dayne Freitag Carnegie Mellon University School of Computer
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationPromoting Agriculture Knowledge via Public Web Search Engines : An Experience by an Iranian Librarian in Response to Agricultural Queries
Promoting Agriculture Knowledge via Public Web Search Engines : An Experience by an Iranian Librarian in Response to Agricultural Queries Sedigheh Mohamadesmaeil Saeed Ghaffari Sedigheh Mohamadesmaeil
More informationPersonalized Information Management for Web Intelligence
Personalized Information Management for Web Intelligence Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace, Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Web intelligence can be defined
More informationAbstract 1. INTRODUCTION
A Virtual Database Management System For The Internet Alberto Pan, Lucía Ardao, Manuel Álvarez, Juan Raposo and Ángel Viña University of A Coruña. Spain e-mail: {alberto,lucia,mad,jrs,avc}@gris.des.fi.udc.es
More informationIFS-8000 V2.0 INFORMATION FUSION SYSTEM
IFS-8000 V2.0 INFORMATION FUSION SYSTEM IFS-8000 V2.0 Overview IFS-8000 v2.0 is a flexible, scalable and modular IT system to support the processes of aggregation of information from intercepts to intelligence
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationWeb Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113
CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/
More informationBinary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi
More informationIndex Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
More informationA Stock Pattern Recognition Algorithm Based on Neural Networks
A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent
More informationWeb Data Extraction: 1 o Semestre 2007/2008
Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008
More informationQOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2015 Submitted: March 19, 2015 Accepted: April
More informationA CLIENT-ORIENTATED DYNAMIC WEB SERVER. Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2. Abstract
A CLIENT-ORIENTATED DYNAMIC WEB SERVER Cristina Hava Muntean, Jennifer McManis, John Murphy 1 and Liam Murphy 2 Abstract The cost of computer systems has decreased continuously in recent years, leading
More informationA HYBRID RULE BASED FUZZY-NEURAL EXPERT SYSTEM FOR PASSIVE NETWORK MONITORING
A HYBRID RULE BASED FUZZY-NEURAL EXPERT SYSTEM FOR PASSIVE NETWORK MONITORING AZRUDDIN AHMAD, GOBITHASAN RUDRUSAMY, RAHMAT BUDIARTO, AZMAN SAMSUDIN, SURESRAWAN RAMADASS. Network Research Group School of
More informationIntegrating Pattern Mining in Relational Databases
Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a
More informationBest Practice Search Engine Optimisation
Best Practice Search Engine Optimisation October 2007 Lead Hitwise Analyst: Australia Heather Hopkins, Hitwise UK Search Marketing Services Contents 1 Introduction 1 2 Search Engines 101 2 2.1 2.2 2.3
More informationEfficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration
Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under
More informationTime: A Coordinate for Web Site Modelling
Time: A Coordinate for Web Site Modelling Paolo Atzeni Dipartimento di Informatica e Automazione Università di Roma Tre Via della Vasca Navale, 79 00146 Roma, Italy http://www.dia.uniroma3.it/~atzeni/
More informationHow To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,
More informationFig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.
Client-Side Dynamic Web Page Generation CGI, PHP, JSP, and ASP scripts solve the problem of handling forms and interactions with databases on the server. They can all accept incoming information from forms,
More informationIntegrating Heterogeneous Data Sources Using XML
Integrating Heterogeneous Data Sources Using XML 1 Yogesh R.Rochlani, 2 Prof. A.R. Itkikar 1 Department of Computer Science & Engineering Sipna COET, SGBAU, Amravati (MH), India 2 Department of Computer
More informationSite-Specific versus General Purpose Web Search Engines: A Comparative Evaluation
Panhellenic Conference on Informatics Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation G. Atsaros, D. Spinellis, P. Louridas Department of Management Science and Technology
More informationLoad Distribution in Large Scale Network Monitoring Infrastructures
Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu
More informationA UPS Framework for Providing Privacy Protection in Personalized Web Search
A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
More informationHigh-performance XML Storage/Retrieval System
UDC 00.5:68.3 High-performance XML Storage/Retrieval System VYasuo Yamane VNobuyuki Igata VIsao Namba (Manuscript received August 8, 000) This paper describes a system that integrates full-text searching
More informationLow Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment
2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org
More informationSEARCH AND CLASSIFICATION OF "INTERESTING" BUSINESS APPLICATIONS IN THE WORLD WIDE WEB USING A NEURAL NETWORK APPROACH
SEARCH AND CLASSIFICATION OF "INTERESTING" BUSINESS APPLICATIONS IN THE WORLD WIDE WEB USING A NEURAL NETWORK APPROACH Abstract Karl Kurbel, Kirti Singh, Frank Teuteberg Europe University Viadrina Frankfurt
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationFinancial Trading System using Combination of Textual and Numerical Data
Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationCS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A
More informationCLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD
Acta Electrotechnica et Informatica No. 2, Vol. 6, 2006 1 CLASSIFICATION AND CLUSTERING METHODS IN THE DECREASING OF THE INTERNET COGNITIVE LOAD Kristína MACHOVÁ, Ivan KLIMKO Department of Cybernetics
More informationBuilding A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed
More informationUniversität Augsburg. Institut für Informatik D-86135 Augsburg. Learning Scrutable User Models: Inducing Conceptual Descriptions. Martin E.
Universität Augsburg Learning Scrutable User Models: Inducing Conceptual Descriptions Martin E. Müller Report 2002-07 März 2002 Institut für Informatik D-86135 Augsburg Copyright c Martin E. Müller Institut
More informationData Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.
Data Integration using Agent based Mediator-Wrapper Architecture Tutorial Report For Agent Based Software Engineering (SENG 609.22) Presented by: George Shi Course Instructor: Dr. Behrouz H. Far December
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationCONFIGURATION MANAGEMENT TECHNOLOGY FOR LARGE-SCALE SIMULATIONS
SCS M&S Magazine. Vol 3. Issue 3. A. Sekiguchi, K. Shimada, Y. Wada, A. Ooba, R. Yoshimi, and A. Matsumoto. CONFIGURATION MANAGEMENT TECHNOLOGY FOR LARGE-SCALE SIMULATIONS Atsuji Sekiguchi, Kuniaki Shimada,
More informationSearch engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
More informationIEEE IoT IoT Scenario & Use Cases: Social Sensors
IEEE IoT IoT Scenario & Use Cases: Social Sensors Service Description More and more, people have the possibility to monitor important parameters in their home or in their surrounding environment. As an
More informationPartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2
More informationEMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
More informationCompetitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
More informationMining various patterns in sequential data in an SQL-like manner *
Mining various patterns in sequential data in an SQL-like manner * Marek Wojciechowski Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland Marek.Wojciechowski@cs.put.poznan.pl
More informationHolland s GA Schema Theorem
Holland s GA Schema Theorem v Objective provide a formal model for the effectiveness of the GA search process. v In the following we will first approach the problem through the framework formalized by
More informationExtension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
More informationUSING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS
USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS Foued BAROUNI Eaton Canada FouedBarouni@eaton.com Bernard MOULIN Laval University Canada Bernard.Moulin@ift.ulaval.ca ABSTRACT
More informationOn the use of the multimodal clues in observed human behavior for the modeling of agent cooperative behavior
From: AAAI Technical Report WS-02-03. Compilation copyright 2002, AAAI (www.aaai.org). All rights reserved. On the use of the multimodal clues in observed human behavior for the modeling of agent cooperative
More informationText Classification Using Symbolic Data Analysis
Text Classification Using Symbolic Data Analysis Sangeetha N 1 Lecturer, Dept. of Computer Science and Applications, St Aloysius College (Autonomous), Mangalore, Karnataka, India. 1 ABSTRACT: In the real
More informationDevelopment of a personal agenda and a distributed meeting scheduler based on JADE agents
Development of a personal agenda and a distributed meeting scheduler based on JADE agents Miguel Ángel Sánchez Álvaro Rayón Alonso Grupo de Sistemas Inteligentes Departamento de Ingeniería Telemática Universidad
More informationKEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
More informationElectronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
More information