An Information Retrieval using weighted Index Terms in Natural Language document collections

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "An Information Retrieval using weighted Index Terms in Natural Language document collections"

Transcription

1 Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia University, Minia, Egypt and * Bahgat A. Abdel Latef, Minia University, Minia, Egypt and Abdel Mgeid A. Ali, Minia University, Minia, Egypt and Osman A. Sadek, Minia University, Minia, Egypt and Abstract Indexing a document is the method for describing its content for sake of easier subsequent retrieval in a document storage. This paper describes the implementation of the automatic indexing of various term weighting schemes in an IR (Information Retrieval) system using CISI documents collection which constitutes of abstracts for information retrieval papers and NPL collection which constitutes of abstracts for electronic engineering documents. The system starts with a simple form of text representation in which extracts keywords that represent documents as vectors of weights that represent the importance of keywords in documents of the documents collection and then evaluates, compares the retrieval effectiveness of various search models based on automatic text-word indexing and presents experimental results conduct to study the improvements made on the effectiveness of the text retrieval by successively applying these approaches. 1. Introduction Recently, people have started dealing with an increasing number of electronic documents in information networks. Finding specific documents that users need from among all available documents is an important issue. Information Storage and Retrieval Systems make large volumes of text accessible to people with information needs [2], [6]. The user provides an outline of his requirement perhaps a list of keywords relating to the topic in the form as a question, or even an example document. The system searches its database for documents that are related to the user s query and presents those which are most relevant. Most document retrieval systems use keywords to retrieve documents. These systems first extract keywords from documents and then assign weights to the keywords by using different approaches. Such systems have two major problems. One is how to extract keywords precisely [15], [1], [9] and the other is how to decide the weight of each keyword [14], [3]. Gerard Salton was long an advocate for term weighting approaches and was himself a pioneer in developing techniques for term weighting schemes. He and Christopher Buckley summarize the results of the previous 20 years in their paper [4], which was reprinted in [11]. The remainder of this paper is organized as follows. Section 2 presents Documents Collections in which, we describe the documents that are used in our system. Section 3 presents System Model which describes the IR architecture, various indexing schemes,cosine similarity and recall precision measures. Section 4 presents our evaluation methodology and compares the retrieval effectiveness of various approaches used to index and retrieve documents. Section 5 conclusion provides our study s main findings. Finally, a Future work in which, we will apply genetic algorithm to improve the performance of information retrieval system. 2. Documents Collections At the first conference in 1992, TREC (Text REtrieval Conference) were used a collection of over 50 queries (called topics in the TREC jargon), containing 2 GB of text. In general, newswire articles are taken into account as full-text documents. IR collections are composed of 2 textual materials: a set of documents and a set of queries. For each query, a list of relevant documents is associated. The list can be flat (all documents provided are supposed to be similarly relevant to the given query), or ordered (each relevant document is provided with a relevance level). Because * Corresponding Author

2 Internet and Information Technology in Modern Organizations: Challenges & Answers 636 they are obtained by automatic methods (pooling), large document collections are often flat. Examples of document and query of the Wall Street Journal sub-collection that can be found in the TREC collection are respectively. The TREC collection is a large collection from 2 GB up to 20 GB (7.5 million documents) for the very large corpus track introduced in (TREC-6), which requires time-consuming preparation before experiments can be carried out effectively at a local site. An alternative approach is to use a smaller collection: in this case, minimal evaluations as suggested by Hersh [17], must be done with at least 1000 documents and a minimum of 50 queries should be tested. The set of relevant documents for each example information request (topic) is obtained from a pool of possible relevant documents. This pool is created by taking the top K documents (usually K = 100) in the rankings generated by the various participating retrieval systems. The documents in the pool are then shown to human assessors who ultimately decide on the relevance of each document. TREC uses the following working and user-centered definition of relevance [7]. We study two documents collections CISI (Comités Interministériels pour la Société de l Information) and NPL (Natural Processing Language) in which each collection manipulates a specific topic or specific field in real life documents. For example, CISI collection concerned with information retrieval topic. Table (1) shows some of these documents collections with some of their properties. Table (1). Test collections related to CISI, and NPL collection Collection Subject No. Docs No. Queries ADI Information Science CACM Computer Science CISI Information Retrieval CRAN Aeronautics LISA Library Science MEDLARS Biomedicine NLM Biomedicine NPL Elec. Engineering 11, TIME General Articles Information Retrieval System Model In an Information Retrieval (IR) system manages its text resources by processing their words to extract and assign content descriptive index terms to documents or queries[12]. As we use naturally spoken or written language, words are formulated with many morphological variants, even if they are referred to as a common concept. Therefore, the words often undergo pre-processing. They are stemmed [16], [13]. Stemming has to be performed in order to allow words, which are formulated with morphological variants, to group up with each other, indicating a similar meaning. Most of the stemming algorithms reduce word forms to an approximation of a common morphological root, called stem. The objective is to eliminate the variation that arises from the occurrence of different grammatical forms of the same word, e.g., retrieve, retrieved, retrieves and retrieval should all be recognized as forms of the same word. Hence, it should not be necessary for the user who formulates a query to specify every possible form of a word that he believes may occur in documents for which he is searching. Another common form of preprocessing is the elimination of common words that have little power to discriminate relevant from nonrelevant documents, e.g., the, a, it and same words. Hence, IR engines are usually provided with a stop list of such noise words. This set of terms defines a space such that each distinct term represents one dimension in that space. Since we are representing each document as a set of terms, we can view this space as a document space [4], [6]. Then, we can assign a numeric weight to each term in a given document representing an estimate (usually but not necessarily statistically) of the usefulness of the given term as a descriptor of the given document. This means that the weight of the given term estimates of its usefulness for distinguishing the given document than other documents in the same collection. It should be pointed out that a given term might receive a different weight in each document in which it occurs; a term may be a better descriptor of one document than of another. The following system procedure is usable for each local document: i) Identify the individual text words ii) iii) Remove special function (negative) words contained on a list of excluded words ( and, of, or, but, etc.). Reduce the remaining words to word stem form by applying suffix deletion method.

3 Internet and Information Technology in Modern Organizations: Challenges & Answers 637 iv) Assign a term weight to the remaining word stems based on the word stem frequency in an individual local document, the overall inverse document frequency of stem in the collection and the local document length. v) Makes each document as a vector of term s vi) weights in document spaces. Classify each document in one or more category under a threshold. vii) Apply the previous steps in queries to make queries vector. viii) Get the top of 30 documents relevant for each ix) a given query, according to the cosine similarity measure. Compute the Recall-Precision for each query, and then get Average Recall-Precision for given queries. x) Draw a graph that represents the Average Recall-Precision relationship. Figure 1. Shows an example of Information Retrieval System architecture. Various Indexing Schemes As for deciding of the weight of each term, the simplest way is to make the weight represents the frequency ( tf ), which is the occurrence of that term in the given document applied on the entire collection. If there are large amount of documents, the terms would occur frequently. Thus, to allow variation in document size, the weight is usually " normalized ". In [8] two kinds of normalization. The first normalization of the term frequency, tf is divided by the tf max ( Maximum Term Frequency ). This kind of normalization has been called mn (Maximum Normalization ). The second kind antf ( Normalized Term Frequency ) represents by equation (1), which causes the normalized tf to vary between 0.5 and 1: where W ij is weight and tf i j is the frequency of term i and tf max is the term frequency in document j. The purpose of term frequency normalization (in either form) is that the weight (the importance ) of a term in a given document should depends on its frequency of occurrence relative to other terms in the same document, not its absolute frequency of occurrence. Weighting a term by absolute frequency would obviously tend to favor longer documents over shorter documents. However, there is a potential flaw in mn. The normalization factor for a given document depends only on the frequency of the most frequent term(s) in the document. The same problem arises with antf, but to a less extreme degree since the high frequency term will have a weight of one as with mn but it cannot drag the weights of the other term below 0.5. A commonly-used alternative to normalize the term frequency is to take its natural logarithm plus a constant, e.g., " log ( tf )+1 ". This technique, called ltf ( Logarithmic Term Frequency ), does not explicitly take document length or term frequency into account but it does reduce the importance of raw term frequency in those collections with widely varying document length. It also reduces the effect of a term with an unusually high term frequency within a given document. In general, it reduces the effect of all variation in term frequency. In [10] introduces another normalized method, which is known as Inverse Document Frequency idf measure as follows:

4 Internet and Information Technology in Modern Organizations: Challenges & Answers 638 where N is the number of documents and n i is the total number of documents containing the term i. Several methods are presented to combine tf with the idf measure. The most successful and widely used scheme for automatic generation of weights is tf * idf. Another approach proposed by [4] is given as follows: 4. Experimental Studies In the studying of Vector Space Model system for different weighting schemes applied on CISI and NPL collections for 100 queries, and by computing the average recall precision for each weighting scheme of each collection. We get the following (Table 2) and (Table 3) for CISI and NPL collections respectively, for some of these schemes, and the Recall Precision (Figure 1) and (Figure 2) for these schemes. where W ij is the weight, freq i j is the frequency of the term i in the document j, and maxfreq j is the frequency of any term in the document j. However, we can use frequency or other approaches separately or together to an appropriate index weights. Similarity and Recall-Precision measures The proposed system is based on a vector space model [6] in which both documents and queries are represented as vectors. The components of the vector are weights of keywords extracted from documents or queries; and we use weighting schemes ltf * idf, antf and antf * idf schemes; then, the cosine similarity has been used for measuring relevance between document and query which have the following formula: where Q is the query s vector and D is document s vector of document d, then we rank documents according to their similarity measure with queries for retrieval mechanism and measure effectiveness of the system for retrieving relevant document according to Recall - Precision measures which are: Table 2. Average Recall Precision for 100 queries applied on CISI Collection Average Precision for 100 test queries recall normalization normalization*idf Table 3. Average Recall Precision for 100 queries applied on NPL Collection Average Precision for 100 test queries recall normalization normalization*idf

5 Internet and Information Technology in Modern Organizations: Challenges & Answers Figure 2. Represents an Average Recall-Precision for Augemented normalization normalization*idf Augemented normalization normalization*idf 100 queries applied on CISI Colletion Figure 3. Represents an Average Recall-Precision for 100 queries applied on NPL Colletion 5. Conclusion From the above comparison among the schemes shown in Table 2 and Table 3, we conclude that the antf gives more effectiveness than antf * idf, than ltf * idf in this study for 100 queries applying on CISI and NPL collections. Also, we note that although the idf of a given term is statistics measure that characterizes that term relative to a given collection, not relative to a query. It would be inefficient to recompute the weight of such a term in every document in which it occurs, whenever new documents are added to the collection (or old document are removed), since idf must be recomputed for each descriptor term in the affected documents collection. 6. Future Work In the future work, we apply Genetic Algorithm System (GAs) in Relevance Feedback Problem to improve the performance the effectiveness of the IR systems, which apply in vector space model, and compares that technique with one of the best traditional methods of Relevance Feedback Ide dechi Relevance Feedback method [5]. In that work, we use CISI and NPL Collection, and the Experimental Scheme with which to implement Relevance Feedback using the different methods (the GAs and the Ide dechi method) as the following: For each collection, each query is compared with all the documents, using the cosine similarity measure. This yields a list giving the similarities of each query with all the documents of the collection. This list is ranked in decreasing order of degree of similarity. The normalized document vectors corresponding to the top 15 documents of the list (which will be those to use as feedback), with their relevance scores and the normalized query vector, are provided as input to the query optimization algorithm. 7. References [1] A. Chen, J. He, L. Xu, F. C. Gey and J. Meggs, Chinese text retrieval without using a dictionary. ACM SIGIR'97, Philadelphia, PA, USA, pp.42-49, [2] C. J. Van Rijsbergen, Information Retrieval. Butterworths, London, second edition, [3] D. Lewis, R. Shapire, J.P. Callan and R. Papka, Training algorithms for linear text classifiers. ACM SIGIR'96, Zurich, Switzerland, pp , [4] G. Salton and C. Buckley, Term weighting approaches in automatic text retrieval. Information Processing and Management, pp , [5] G. Salton and C. Buckley, Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, Vol 41, No. 4, pp , [6] G. Salton and M. J. McGill, Introduction to modern information retrieval. Englewood Cli.s, NJ: Prentice-Hall, 1983a. [7] H. Voorhees and D. Harman, Proceedings of the sixth text retrieval conference, TREC-6, 1997.

6 Internet and Information Technology in Modern Organizations: Challenges & Answers 640 [8] J.H. Lee, Combining multiple evidence from different properties of weighting schemes. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp , [9] K. L. Kwok, Comparing representations in Chinese information retrieval. ACM SIGIR '97, Philadelphia, PA, USA, pp.34-41, [10] K. S. Jones, A statistical interpretation of term specificity and its application in retrieval. J. Documentation, pp.11-20, [11] K. Sparck Jones and P. Willett (Eds.), Readings in information retrieval. San Francisco: Morgan Kaufman, pp , [12] M. Bacchin, N. Ferro and M. Melucci, A probabilistic model for stemmer generation. Italy, Information Processing and Management Vol. 41, pp , [13] M. F. Porter An algorithm for suffix stripping, Program, Vol. 14, No. 3, [14] M. Gordon, Probabilistic and genetic algorithms in document retrieval. Communications of the ACM, Vol. 31, No. 10, pp , [15] R. A. Baeza-Yates, Text retrieval: theory and practice, In International federation for information processing congress, Vol. 1, Madrid, Spain, pp , [16] W. B. Frakes and R. Baeza-Yates. In W. B. Frakes & B. Y. Ricardo (Eds.), Information retrieval: data structures & algorithms. Englewood Cliffs, NJ: Prentice-Hall, [17] W. Hersh. Information Retrieval: a Healthcare Perspective,Springer, 1996.

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval

More information

How Effective Is Suffixing?

How Effective Is Suffixing? How Effective Is Suffixing? Donna Harman* lister Hill Center for Biomedical Communications, National Library of Medicine, Bethesda, MD 229 The interaction of suffixing algorithms and ranking techniques

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Dictionary-based Amharic - English Information Retrieval

Dictionary-based Amharic - English Information Retrieval Dictionary-based Amharic - English Information Retrieval Atelach Alemu Argaw 1, Lars Asker 1, Rickard Cöster 2 and Jussi Karlgren 2 1 Department of Computer and Systems Sciences, Stockholm University/Royal

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Modern Information Retrieval: A Brief Overview

Modern Information Retrieval: A Brief Overview Modern Information Retrieval: A Brief Overview Amit Singhal Google, Inc. singhal@google.com Abstract For thousands of years people have realized the importance of archiving and finding information. With

More information

Review: Information Retrieval Techniques and Applications

Review: Information Retrieval Techniques and Applications International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373 377 Available online at: www.ijcncs.org E-ISSN 2308-9830 (Online) / ISSN 2410-0595 (Print) Review:

More information

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines.

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines. Web Content Mining Web Content Mining Pre-processing data before web content mining: feature selection Post-processing data can reduce ambiguous searching results Web Page Content Mining Mines the contents

More information

Social Business Intelligence Text Search System

Social Business Intelligence Text Search System Social Business Intelligence Text Search System Sagar Ligade ME Computer Engineering. Pune Institute of Computer Technology Pune, India ABSTRACT Today the search engine plays the important role in the

More information

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

A statistical interpretation of term specificity and its application in retrieval

A statistical interpretation of term specificity and its application in retrieval Reprinted from Journal of Documentation Volume 60 Number 5 2004 pp. 493-502 Copyright MCB University Press ISSN 0022-0418 and previously from Journal of Documentation Volume 28 Number 1 1972 pp. 11-21

More information

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^ 1-1 I. The SMART Project - Status Report and Plans G. Salton 1. Introduction The SMART document retrieval system has been operating on a 709^ computer since the end of 1964. The system takes documents

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5

More information

Information Retrieval. Lecture 3: Evaluation methodology

Information Retrieval. Lecture 3: Evaluation methodology Information Retrieval Lecture 3: Evaluation methodology Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2. General concepts

More information

SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON

SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON Essam S. Hanandeh, Department of Computer Information System, Zarqa University, Zarqa, Jordan Hanandeh@zu.edu.jo ABSTRACT The massive

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

Exploring Adaptive Window Sizes for Entity Retrieval

Exploring Adaptive Window Sizes for Entity Retrieval Exploring Adaptive Window Sizes for Entity Retrieval Fawaz Alarfaj, Udo Kruschwitz, and Chris Fox School of Computer Science and Electronic Engineering University of Essex Colchester, CO4 3SQ, UK {falarf,udo,foxcj}@essex.ac.uk

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Document Retrieval, Automatic

Document Retrieval, Automatic Syracuse University SURFACE The School of Information Studies Faculty Scholarship School of Information Studies (ischool) 2005 Document Retrieval, Automatic Elizabeth D. Liddy Syracuse University, liddy@syr.edu

More information

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9 Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;

More information

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation Panhellenic Conference on Informatics Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation G. Atsaros, D. Spinellis, P. Louridas Department of Management Science and Technology

More information

Eng. Mohammed Abdualal

Eng. Mohammed Abdualal Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES FOUNDATION OF CONTROL AND MANAGEMENT SCIENCES No Year Manuscripts Mateusz, KOBOS * Jacek, MAŃDZIUK ** ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES Analysis

More information

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance Emanuele Di Buccio Department of Information Engineering University of Padua, Italy dibuccio@dei.unipd.it Mounia Lalmas

More information

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007 Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information

More information

Development of an Enhanced Web-based Automatic Customer Service System

Development of an Enhanced Web-based Automatic Customer Service System Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Performance Criteria Evaluation of Retrieval Systems 1 1. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs

More information

Statistical Natural Language Processing

Statistical Natural Language Processing Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with

More information

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience Abstract N.J. Belkin, C. Cool*, J. Head, J. Jeng, D. Kelly, S. Lin, L. Lobash, S.Y.

More information

A survey on the use of relevance feedback for information access systems

A survey on the use of relevance feedback for information access systems A survey on the use of relevance feedback for information access systems Ian Ruthven Department of Computer and Information Sciences University of Strathclyde, Glasgow, G1 1XH. Ian.Ruthven@cis.strath.ac.uk

More information

Performance Evaluation of Desktop Search Engines

Performance Evaluation of Desktop Search Engines Performance Evaluation of Desktop Search Engines Chang-Tien Lu, Manu Shukla, Siri H. Subramanya, Yamin Wu Department of Computer Science, Virginia Polytechnic Institute and State University, USA ctlu@vt.edu,

More information

Improving Contextual Suggestions using Open Web Domain Knowledge

Improving Contextual Suggestions using Open Web Domain Knowledge Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma

More information

Learn to Answer Contextual Questions. Sam Shaojun Zhao Oct 13, 2005

Learn to Answer Contextual Questions. Sam Shaojun Zhao Oct 13, 2005 Learn to Answer Contextual Questions Sam Shaojun Zhao Oct 13, 2005 Outline Question Answering Context Question Answering (CQA) Information Retrieval Approach Learning Approach Experiments Conclusion Question

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

More information

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS Caldas, Carlos H. 1 and Soibelman, L. 2 ABSTRACT Information is an important element of project delivery processes.

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Inverted files and dynamic signature files for optimisation of Web directories

Inverted files and dynamic signature files for optimisation of Web directories s and dynamic signature files for optimisation of Web directories Fidel Cacheda, Angel Viña Department of Information and Communication Technologies Facultad de Informática, University of A Coruña Campus

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Information Retrieval System Assigning Context to Documents by Relevance Feedback

Information Retrieval System Assigning Context to Documents by Relevance Feedback Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,

More information

Modeling Concept and Context to Improve Performance in ediscovery

Modeling Concept and Context to Improve Performance in ediscovery By: H. S. Hyman, ABD, University of South Florida Warren Fridy III, MS, Fridy Enterprises Abstract One condition of ediscovery making it unique from other, more routine forms of IR is that all documents

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Text mining & Information Retrieval Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Michal Rott, Petr Cerva Institute of Information Technology and Electronics Technical University of Liberec Studentska 2,

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

ISCL wintersemester 2007 IR Midterm exam. Exercise 2 : Characteristics of a collection and its index

ISCL wintersemester 2007 IR Midterm exam. Exercise 2 : Characteristics of a collection and its index ISCL wintersemester 2007 IR Midterm exam 17 December 2007 SOLUTIONS Non-electronic documents and calculators are authorized. Name : Semester : Exercise 1 : Definitions Define the following terms : tokenization

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

end for replace M chromosomes by offsprings end for return optimal chromosome

end for replace M chromosomes by offsprings end for return optimal chromosome Web-Document Retrieval by Genetic Learning of Importance Factors for HTML Tags Sun Kim and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National

More information

Paper Classification for Recommendation on Research Support System Papits

Paper Classification for Recommendation on Research Support System Papits IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.5A, May 006 17 Paper Classification for Recommendation on Research Support System Papits Tadachika Ozono, and Toramatsu Shintani,

More information

Query Performance Prediction

Query Performance Prediction Query Performance Prediction Ben He a Iadh Ounis a a Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, United Kingdom ben, ounis@dcs.gla.ac.uk Abstract The prediction of query performance

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:

More information

Information Retrieval Support Systems

Information Retrieval Support Systems 1 Information Retrieval Support Systems Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract - Information retrieval support

More information

The Scope of IR. IR systems are part of a family that shares many principles (Figure 1).

The Scope of IR. IR systems are part of a family that shares many principles (Figure 1). Information Retrieval Information retrieval systems are everywhere: Web search engines, library catalogs, store catalogs, cookbook indexes, and so on. Information retrieval (IR), also called information

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

TEMPER : A Temporal Relevance Feedback Method

TEMPER : A Temporal Relevance Feedback Method TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The

More information

Comparison of Standard and Zipf-Based Document Retrieval Heuristics

Comparison of Standard and Zipf-Based Document Retrieval Heuristics Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany

More information

A Web Page Classification Algorithm Based on Feature Selection

A Web Page Classification Algorithm Based on Feature Selection Journal of Information & Computational Science 12:4 (2015) 1549 1556 March 1, 2015 Available at http://www.joics.com A Web Page Classification Algorithm Based on Feature Selection Hongfang Zhou a,, Jie

More information

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

More information

Query by Rhythm An Approach for Song Retrieval in Music Databases*

Query by Rhythm An Approach for Song Retrieval in Music Databases* Query by Rhythm An Approach for Song Retrieval in Music Databases* James C. C. Chen and Arbee L.P. Chen Department of Computer Science National Tsing Hua University Hsinchu, Taiwan 300, R.O.C. Email :

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval

Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval Kyung-Soon Lee 1, Kyo Kageura 1, Key-Sun Choi 2 1 NII (National Institute of Informatics)

More information

Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features. Cordelia Schmid Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising Giuliano Armano, Alessandro Giuliani and Eloisa Vargiu Dept. of Electric and Electronic Engineering University of Cagliari Cagliari,

More information

Web Search Engines: Solutions

Web Search Engines: Solutions Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

More information

Using Interdocument Similarity Information in Document Retrieval Systems

Using Interdocument Similarity Information in Document Retrieval Systems Using Interdocument Similarity Information in Document Retrieval Systems Alan Griffiths, H. Claire Luckhurst, and Peter Willett* Department of Information Studies, University of Sheffield, Western Bank,

More information

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation 1

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation 1 Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation 1 G. Atsaros, D. Spinellis, P. Louridas Department of Management Science and Technology Athens University of Economics

More information

A Distributed Multi-Agent System for Collaborative Information Management and Sharing

A Distributed Multi-Agent System for Collaborative Information Management and Sharing A Distributed Multi-Agent System for Collaborative Information Management and Sharing James R. Chen & Shawn R. Wolfe NASA Ames Research Center Mail Stop 269-2 Moffett Field, CA 94035-1000 {jchen, shawn}@ptolemy.arc.nasa.gov

More information

PERSONALIZATION OF SEARCH ENGINE SERVICES FOR EFFECTIVE RETRIEVAL AND KNOWLEDGE MANAGEMENT

PERSONALIZATION OF SEARCH ENGINE SERVICES FOR EFFECTIVE RETRIEVAL AND KNOWLEDGE MANAGEMENT PERSONALIZATION OF SEARCH ENGINE SERVICES FOR EFFECTIVE RETRIEVAL AND KNOWLEDGE MANAGEMENT Weiguo Fan Michael D. Gordon University of Michigan Business School U.S.A. Praveen Pathak School of Management

More information

WE DEFINE spam as an e-mail message that is unwanted basically

WE DEFINE spam as an e-mail message that is unwanted basically 1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir

More information

Manual Query Modification and Data Fusion for Medical Image Retrieval

Manual Query Modification and Data Fusion for Medical Image Retrieval Manual Query Modification and Data Fusion for Medical Image Retrieval Jeffery R. Jensen and William R. Hersh Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University

More information

Figure 1: Network architecture and distributed (shared-nothing) memory.

Figure 1: Network architecture and distributed (shared-nothing) memory. Query Performance for Tightly Coupled Distributed Digital Libraries Berthier A. Ribeiro-Neto Ramurti A. Barbosa Computer Science Department Federal University of Minas Gerais Brazil berthier,ramurti @dcc.ufmg.br

More information

Automatic Web Page Classification

Automatic Web Page Classification Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

More information

Boosting Bookmark Category Web Page Classification Accuracy using Multiple Clustering Approaches

Boosting Bookmark Category Web Page Classification Accuracy using Multiple Clustering Approaches Boosting Bookmark Category Web Page Classification Accuracy using Multiple Clustering Approaches Chris Staff Department of Artificial Intelligence University of Malta Email: chris.staff@um.edu.mt Abstract

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

Hypertext is fundamentally a database system that provides a

Hypertext is fundamentally a database system that provides a INTERNET SEARCH TOWARD A QUALITATIVE SEARCH ENGINE Traditional search engines do not consider document quality in ranking search results. The YANHONG LI GARI Software/IDD Information Services Hyperlink

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

Information Retrieval Features for Personality Traits

Information Retrieval Features for Personality Traits Information Retrieval Features for Personality Traits Edson Roberto Duarte Weren vveren@gmail.com Abstract. This paper describes the methods employed to solve the Author Profiling task at PAN-2015. The

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

Predicting Query Performance in Intranet Search

Predicting Query Performance in Intranet Search Predicting Query Performance in Intranet Search Craig Macdonald University of Glasgow Glasgow, G12 8QQ, U.K. craigm@dcs.gla.ac.uk Ben He University of Glasgow Glasgow, G12 8QQ, U.K. ben@dcs.gla.ac.uk Iadh

More information

TREC 2007 ciqa Task: University of Maryland

TREC 2007 ciqa Task: University of Maryland TREC 2007 ciqa Task: University of Maryland Nitin Madnani, Jimmy Lin, and Bonnie Dorr University of Maryland College Park, Maryland, USA nmadnani,jimmylin,bonnie@umiacs.umd.edu 1 The ciqa Task Information

More information

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering

More information

Lightweight Document Matching for Help-Desk Applications

Lightweight Document Matching for Help-Desk Applications D A T A M I N I N G Lightweight Document Matching for Help-Desk Applications Sholom M. Weiss, Brian F. White, Chidanand V. Apte, and Fredrick J. Damerau T.J Watson Research Center, IBM Research Division

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS

MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS MULTI LAYER PERCEPTRON FOR WEB PAGE CLASSIFICATION BASED ON TDF/IDF ONTOLOGY BASED FEATURES AND GENETIC ALGORITHMS N.VANJULAVALLI 1, DR.A.KOVALAN 2 1. Research Scholar, Department of Computer Science and

More information