An Information Retrieval using weighted Index Terms in Natural Language document collections

Size: px
Start display at page:

Download "An Information Retrieval using weighted Index Terms in Natural Language document collections"

Transcription

1 Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia University, Minia, Egypt and * Bahgat A. Abdel Latef, Minia University, Minia, Egypt and Abdel Mgeid A. Ali, Minia University, Minia, Egypt and Osman A. Sadek, Minia University, Minia, Egypt and Abstract Indexing a document is the method for describing its content for sake of easier subsequent retrieval in a document storage. This paper describes the implementation of the automatic indexing of various term weighting schemes in an IR (Information Retrieval) system using CISI documents collection which constitutes of abstracts for information retrieval papers and NPL collection which constitutes of abstracts for electronic engineering documents. The system starts with a simple form of text representation in which extracts keywords that represent documents as vectors of weights that represent the importance of keywords in documents of the documents collection and then evaluates, compares the retrieval effectiveness of various search models based on automatic text-word indexing and presents experimental results conduct to study the improvements made on the effectiveness of the text retrieval by successively applying these approaches. 1. Introduction Recently, people have started dealing with an increasing number of electronic documents in information networks. Finding specific documents that users need from among all available documents is an important issue. Information Storage and Retrieval Systems make large volumes of text accessible to people with information needs [2], [6]. The user provides an outline of his requirement perhaps a list of keywords relating to the topic in the form as a question, or even an example document. The system searches its database for documents that are related to the user s query and presents those which are most relevant. Most document retrieval systems use keywords to retrieve documents. These systems first extract keywords from documents and then assign weights to the keywords by using different approaches. Such systems have two major problems. One is how to extract keywords precisely [15], [1], [9] and the other is how to decide the weight of each keyword [14], [3]. Gerard Salton was long an advocate for term weighting approaches and was himself a pioneer in developing techniques for term weighting schemes. He and Christopher Buckley summarize the results of the previous 20 years in their paper [4], which was reprinted in [11]. The remainder of this paper is organized as follows. Section 2 presents Documents Collections in which, we describe the documents that are used in our system. Section 3 presents System Model which describes the IR architecture, various indexing schemes,cosine similarity and recall precision measures. Section 4 presents our evaluation methodology and compares the retrieval effectiveness of various approaches used to index and retrieve documents. Section 5 conclusion provides our study s main findings. Finally, a Future work in which, we will apply genetic algorithm to improve the performance of information retrieval system. 2. Documents Collections At the first conference in 1992, TREC (Text REtrieval Conference) were used a collection of over 50 queries (called topics in the TREC jargon), containing 2 GB of text. In general, newswire articles are taken into account as full-text documents. IR collections are composed of 2 textual materials: a set of documents and a set of queries. For each query, a list of relevant documents is associated. The list can be flat (all documents provided are supposed to be similarly relevant to the given query), or ordered (each relevant document is provided with a relevance level). Because * Corresponding Author

2 Internet and Information Technology in Modern Organizations: Challenges & Answers 636 they are obtained by automatic methods (pooling), large document collections are often flat. Examples of document and query of the Wall Street Journal sub-collection that can be found in the TREC collection are respectively. The TREC collection is a large collection from 2 GB up to 20 GB (7.5 million documents) for the very large corpus track introduced in (TREC-6), which requires time-consuming preparation before experiments can be carried out effectively at a local site. An alternative approach is to use a smaller collection: in this case, minimal evaluations as suggested by Hersh [17], must be done with at least 1000 documents and a minimum of 50 queries should be tested. The set of relevant documents for each example information request (topic) is obtained from a pool of possible relevant documents. This pool is created by taking the top K documents (usually K = 100) in the rankings generated by the various participating retrieval systems. The documents in the pool are then shown to human assessors who ultimately decide on the relevance of each document. TREC uses the following working and user-centered definition of relevance [7]. We study two documents collections CISI (Comités Interministériels pour la Société de l Information) and NPL (Natural Processing Language) in which each collection manipulates a specific topic or specific field in real life documents. For example, CISI collection concerned with information retrieval topic. Table (1) shows some of these documents collections with some of their properties. Table (1). Test collections related to CISI, and NPL collection Collection Subject No. Docs No. Queries ADI Information Science CACM Computer Science CISI Information Retrieval CRAN Aeronautics LISA Library Science MEDLARS Biomedicine NLM Biomedicine NPL Elec. Engineering 11, TIME General Articles Information Retrieval System Model In an Information Retrieval (IR) system manages its text resources by processing their words to extract and assign content descriptive index terms to documents or queries[12]. As we use naturally spoken or written language, words are formulated with many morphological variants, even if they are referred to as a common concept. Therefore, the words often undergo pre-processing. They are stemmed [16], [13]. Stemming has to be performed in order to allow words, which are formulated with morphological variants, to group up with each other, indicating a similar meaning. Most of the stemming algorithms reduce word forms to an approximation of a common morphological root, called stem. The objective is to eliminate the variation that arises from the occurrence of different grammatical forms of the same word, e.g., retrieve, retrieved, retrieves and retrieval should all be recognized as forms of the same word. Hence, it should not be necessary for the user who formulates a query to specify every possible form of a word that he believes may occur in documents for which he is searching. Another common form of preprocessing is the elimination of common words that have little power to discriminate relevant from nonrelevant documents, e.g., the, a, it and same words. Hence, IR engines are usually provided with a stop list of such noise words. This set of terms defines a space such that each distinct term represents one dimension in that space. Since we are representing each document as a set of terms, we can view this space as a document space [4], [6]. Then, we can assign a numeric weight to each term in a given document representing an estimate (usually but not necessarily statistically) of the usefulness of the given term as a descriptor of the given document. This means that the weight of the given term estimates of its usefulness for distinguishing the given document than other documents in the same collection. It should be pointed out that a given term might receive a different weight in each document in which it occurs; a term may be a better descriptor of one document than of another. The following system procedure is usable for each local document: i) Identify the individual text words ii) iii) Remove special function (negative) words contained on a list of excluded words ( and, of, or, but, etc.). Reduce the remaining words to word stem form by applying suffix deletion method.

3 Internet and Information Technology in Modern Organizations: Challenges & Answers 637 iv) Assign a term weight to the remaining word stems based on the word stem frequency in an individual local document, the overall inverse document frequency of stem in the collection and the local document length. v) Makes each document as a vector of term s vi) weights in document spaces. Classify each document in one or more category under a threshold. vii) Apply the previous steps in queries to make queries vector. viii) Get the top of 30 documents relevant for each ix) a given query, according to the cosine similarity measure. Compute the Recall-Precision for each query, and then get Average Recall-Precision for given queries. x) Draw a graph that represents the Average Recall-Precision relationship. Figure 1. Shows an example of Information Retrieval System architecture. Various Indexing Schemes As for deciding of the weight of each term, the simplest way is to make the weight represents the frequency ( tf ), which is the occurrence of that term in the given document applied on the entire collection. If there are large amount of documents, the terms would occur frequently. Thus, to allow variation in document size, the weight is usually " normalized ". In [8] two kinds of normalization. The first normalization of the term frequency, tf is divided by the tf max ( Maximum Term Frequency ). This kind of normalization has been called mn (Maximum Normalization ). The second kind antf ( Normalized Term Frequency ) represents by equation (1), which causes the normalized tf to vary between 0.5 and 1: where W ij is weight and tf i j is the frequency of term i and tf max is the term frequency in document j. The purpose of term frequency normalization (in either form) is that the weight (the importance ) of a term in a given document should depends on its frequency of occurrence relative to other terms in the same document, not its absolute frequency of occurrence. Weighting a term by absolute frequency would obviously tend to favor longer documents over shorter documents. However, there is a potential flaw in mn. The normalization factor for a given document depends only on the frequency of the most frequent term(s) in the document. The same problem arises with antf, but to a less extreme degree since the high frequency term will have a weight of one as with mn but it cannot drag the weights of the other term below 0.5. A commonly-used alternative to normalize the term frequency is to take its natural logarithm plus a constant, e.g., " log ( tf )+1 ". This technique, called ltf ( Logarithmic Term Frequency ), does not explicitly take document length or term frequency into account but it does reduce the importance of raw term frequency in those collections with widely varying document length. It also reduces the effect of a term with an unusually high term frequency within a given document. In general, it reduces the effect of all variation in term frequency. In [10] introduces another normalized method, which is known as Inverse Document Frequency idf measure as follows:

4 Internet and Information Technology in Modern Organizations: Challenges & Answers 638 where N is the number of documents and n i is the total number of documents containing the term i. Several methods are presented to combine tf with the idf measure. The most successful and widely used scheme for automatic generation of weights is tf * idf. Another approach proposed by [4] is given as follows: 4. Experimental Studies In the studying of Vector Space Model system for different weighting schemes applied on CISI and NPL collections for 100 queries, and by computing the average recall precision for each weighting scheme of each collection. We get the following (Table 2) and (Table 3) for CISI and NPL collections respectively, for some of these schemes, and the Recall Precision (Figure 1) and (Figure 2) for these schemes. where W ij is the weight, freq i j is the frequency of the term i in the document j, and maxfreq j is the frequency of any term in the document j. However, we can use frequency or other approaches separately or together to an appropriate index weights. Similarity and Recall-Precision measures The proposed system is based on a vector space model [6] in which both documents and queries are represented as vectors. The components of the vector are weights of keywords extracted from documents or queries; and we use weighting schemes ltf * idf, antf and antf * idf schemes; then, the cosine similarity has been used for measuring relevance between document and query which have the following formula: where Q is the query s vector and D is document s vector of document d, then we rank documents according to their similarity measure with queries for retrieval mechanism and measure effectiveness of the system for retrieving relevant document according to Recall - Precision measures which are: Table 2. Average Recall Precision for 100 queries applied on CISI Collection Average Precision for 100 test queries recall normalization normalization*idf Table 3. Average Recall Precision for 100 queries applied on NPL Collection Average Precision for 100 test queries recall normalization normalization*idf

5 Internet and Information Technology in Modern Organizations: Challenges & Answers Figure 2. Represents an Average Recall-Precision for Augemented normalization normalization*idf Augemented normalization normalization*idf 100 queries applied on CISI Colletion Figure 3. Represents an Average Recall-Precision for 100 queries applied on NPL Colletion 5. Conclusion From the above comparison among the schemes shown in Table 2 and Table 3, we conclude that the antf gives more effectiveness than antf * idf, than ltf * idf in this study for 100 queries applying on CISI and NPL collections. Also, we note that although the idf of a given term is statistics measure that characterizes that term relative to a given collection, not relative to a query. It would be inefficient to recompute the weight of such a term in every document in which it occurs, whenever new documents are added to the collection (or old document are removed), since idf must be recomputed for each descriptor term in the affected documents collection. 6. Future Work In the future work, we apply Genetic Algorithm System (GAs) in Relevance Feedback Problem to improve the performance the effectiveness of the IR systems, which apply in vector space model, and compares that technique with one of the best traditional methods of Relevance Feedback Ide dechi Relevance Feedback method [5]. In that work, we use CISI and NPL Collection, and the Experimental Scheme with which to implement Relevance Feedback using the different methods (the GAs and the Ide dechi method) as the following: For each collection, each query is compared with all the documents, using the cosine similarity measure. This yields a list giving the similarities of each query with all the documents of the collection. This list is ranked in decreasing order of degree of similarity. The normalized document vectors corresponding to the top 15 documents of the list (which will be those to use as feedback), with their relevance scores and the normalized query vector, are provided as input to the query optimization algorithm. 7. References [1] A. Chen, J. He, L. Xu, F. C. Gey and J. Meggs, Chinese text retrieval without using a dictionary. ACM SIGIR'97, Philadelphia, PA, USA, pp.42-49, [2] C. J. Van Rijsbergen, Information Retrieval. Butterworths, London, second edition, [3] D. Lewis, R. Shapire, J.P. Callan and R. Papka, Training algorithms for linear text classifiers. ACM SIGIR'96, Zurich, Switzerland, pp , [4] G. Salton and C. Buckley, Term weighting approaches in automatic text retrieval. Information Processing and Management, pp , [5] G. Salton and C. Buckley, Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, Vol 41, No. 4, pp , [6] G. Salton and M. J. McGill, Introduction to modern information retrieval. Englewood Cli.s, NJ: Prentice-Hall, 1983a. [7] H. Voorhees and D. Harman, Proceedings of the sixth text retrieval conference, TREC-6, 1997.

6 Internet and Information Technology in Modern Organizations: Challenges & Answers 640 [8] J.H. Lee, Combining multiple evidence from different properties of weighting schemes. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp , [9] K. L. Kwok, Comparing representations in Chinese information retrieval. ACM SIGIR '97, Philadelphia, PA, USA, pp.34-41, [10] K. S. Jones, A statistical interpretation of term specificity and its application in retrieval. J. Documentation, pp.11-20, [11] K. Sparck Jones and P. Willett (Eds.), Readings in information retrieval. San Francisco: Morgan Kaufman, pp , [12] M. Bacchin, N. Ferro and M. Melucci, A probabilistic model for stemmer generation. Italy, Information Processing and Management Vol. 41, pp , [13] M. F. Porter An algorithm for suffix stripping, Program, Vol. 14, No. 3, [14] M. Gordon, Probabilistic and genetic algorithms in document retrieval. Communications of the ACM, Vol. 31, No. 10, pp , [15] R. A. Baeza-Yates, Text retrieval: theory and practice, In International federation for information processing congress, Vol. 1, Madrid, Spain, pp , [16] W. B. Frakes and R. Baeza-Yates. In W. B. Frakes & B. Y. Ricardo (Eds.), Information retrieval: data structures & algorithms. Englewood Cliffs, NJ: Prentice-Hall, [17] W. Hersh. Information Retrieval: a Healthcare Perspective,Springer, 1996.

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

Social Business Intelligence Text Search System

Social Business Intelligence Text Search System Social Business Intelligence Text Search System Sagar Ligade ME Computer Engineering. Pune Institute of Computer Technology Pune, India ABSTRACT Today the search engine plays the important role in the

More information

Review: Information Retrieval Techniques and Applications

Review: Information Retrieval Techniques and Applications International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373 377 Available online at: www.ijcncs.org E-ISSN 2308-9830 (Online) / ISSN 2410-0595 (Print) Review:

More information

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^ 1-1 I. The SMART Project - Status Report and Plans G. Salton 1. Introduction The SMART document retrieval system has been operating on a 709^ computer since the end of 1964. The system takes documents

More information

Modern Information Retrieval: A Brief Overview

Modern Information Retrieval: A Brief Overview Modern Information Retrieval: A Brief Overview Amit Singhal Google, Inc. singhal@google.com Abstract For thousands of years people have realized the importance of archiving and finding information. With

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

A statistical interpretation of term specificity and its application in retrieval

A statistical interpretation of term specificity and its application in retrieval Reprinted from Journal of Documentation Volume 60 Number 5 2004 pp. 493-502 Copyright MCB University Press ISSN 0022-0418 and previously from Journal of Documentation Volume 28 Number 1 1972 pp. 11-21

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Document Retrieval, Automatic

Document Retrieval, Automatic Syracuse University SURFACE The School of Information Studies Faculty Scholarship School of Information Studies (ischool) 2005 Document Retrieval, Automatic Elizabeth D. Liddy Syracuse University, liddy@syr.edu

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5

More information

American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Eng. Mohammed Abdualal

Eng. Mohammed Abdualal Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2

More information

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9

Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9 Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;

More information

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES

ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES FOUNDATION OF CONTROL AND MANAGEMENT SCIENCES No Year Manuscripts Mateusz, KOBOS * Jacek, MAŃDZIUK ** ARTIFICIAL INTELLIGENCE METHODS IN STOCK INDEX PREDICTION WITH THE USE OF NEWSPAPER ARTICLES Analysis

More information

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents

Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Using Words and Phonetic Strings for Efficient Information Retrieval from Imperfectly Transcribed Spoken Documents Michael J. Witbrock and Alexander G. Hauptmann Carnegie Mellon University ABSTRACT Library

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience Abstract N.J. Belkin, C. Cool*, J. Head, J. Jeng, D. Kelly, S. Lin, L. Lobash, S.Y.

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON

SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON Essam S. Hanandeh, Department of Computer Information System, Zarqa University, Zarqa, Jordan Hanandeh@zu.edu.jo ABSTRACT The massive

More information

Development of an Enhanced Web-based Automatic Customer Service System

Development of an Enhanced Web-based Automatic Customer Service System Development of an Enhanced Web-based Automatic Customer Service System Ji-Wei Wu, Chih-Chang Chang Wei and Judy C.R. Tseng Department of Computer Science and Information Engineering Chung Hua University

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation Panhellenic Conference on Informatics Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation G. Atsaros, D. Spinellis, P. Louridas Department of Management Science and Technology

More information

A survey on the use of relevance feedback for information access systems

A survey on the use of relevance feedback for information access systems A survey on the use of relevance feedback for information access systems Ian Ruthven Department of Computer and Information Sciences University of Strathclyde, Glasgow, G1 1XH. Ian.Ruthven@cis.strath.ac.uk

More information

Evaluation of Retrieval Systems

Evaluation of Retrieval Systems Performance Criteria Evaluation of Retrieval Systems 1 1. Expressiveness of query language Can query language capture information needs? 2. Quality of search results Relevance to users information needs

More information

Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

More information

Improving Contextual Suggestions using Open Web Domain Knowledge

Improving Contextual Suggestions using Open Web Domain Knowledge Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma

More information

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS

A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS Caldas, Carlos H. 1 and Soibelman, L. 2 ABSTRACT Information is an important element of project delivery processes.

More information

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007 Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

Electronic Document Management Using Inverted Files System

Electronic Document Management Using Inverted Files System EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Inverted files and dynamic signature files for optimisation of Web directories

Inverted files and dynamic signature files for optimisation of Web directories s and dynamic signature files for optimisation of Web directories Fidel Cacheda, Angel Viña Department of Information and Communication Technologies Facultad de Informática, University of A Coruña Campus

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Comparison of Standard and Zipf-Based Document Retrieval Heuristics

Comparison of Standard and Zipf-Based Document Retrieval Heuristics Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany

More information

Information Retrieval System Assigning Context to Documents by Relevance Feedback

Information Retrieval System Assigning Context to Documents by Relevance Feedback Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Text mining & Information Retrieval Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

Figure 1: Network architecture and distributed (shared-nothing) memory.

Figure 1: Network architecture and distributed (shared-nothing) memory. Query Performance for Tightly Coupled Distributed Digital Libraries Berthier A. Ribeiro-Neto Ramurti A. Barbosa Computer Science Department Federal University of Minas Gerais Brazil berthier,ramurti @dcc.ufmg.br

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:

More information

Modeling Concept and Context to Improve Performance in ediscovery

Modeling Concept and Context to Improve Performance in ediscovery By: H. S. Hyman, ABD, University of South Florida Warren Fridy III, MS, Fridy Enterprises Abstract One condition of ediscovery making it unique from other, more routine forms of IR is that all documents

More information

Using Interdocument Similarity Information in Document Retrieval Systems

Using Interdocument Similarity Information in Document Retrieval Systems Using Interdocument Similarity Information in Document Retrieval Systems Alan Griffiths, H. Claire Luckhurst, and Peter Willett* Department of Information Studies, University of Sheffield, Western Bank,

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction Latent Semantic Indexing with Selective Query Expansion Andy Garron April Kontostathis Department of Mathematics and Computer Science Ursinus College Collegeville PA 19426 Abstract This article describes

More information

TEMPER : A Temporal Relevance Feedback Method

TEMPER : A Temporal Relevance Feedback Method TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

JERIBI Lobna, RUMPLER Beatrice, PINON Jean Marie

JERIBI Lobna, RUMPLER Beatrice, PINON Jean Marie From: FLAIRS-02 Proceedings. Copyright 2002, AAAI (www.aaai.org). All rights reserved. User Modeling and Instance Reuse for Information Retrieval Study Case : Visually Disabled Users Access to Scientific

More information

Finding Advertising Keywords on Web Pages. Contextual Ads 101

Finding Advertising Keywords on Web Pages. Contextual Ads 101 Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising Giuliano Armano, Alessandro Giuliani and Eloisa Vargiu Dept. of Electric and Electronic Engineering University of Cagliari Cagliari,

More information

Predicting Query Performance in Intranet Search

Predicting Query Performance in Intranet Search Predicting Query Performance in Intranet Search Craig Macdonald University of Glasgow Glasgow, G12 8QQ, U.K. craigm@dcs.gla.ac.uk Ben He University of Glasgow Glasgow, G12 8QQ, U.K. ben@dcs.gla.ac.uk Iadh

More information

Lightweight Document Matching for Help-Desk Applications

Lightweight Document Matching for Help-Desk Applications D A T A M I N I N G Lightweight Document Matching for Help-Desk Applications Sholom M. Weiss, Brian F. White, Chidanand V. Apte, and Fredrick J. Damerau T.J Watson Research Center, IBM Research Division

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

Personal Computer Network Marketing System (DIAMS)

Personal Computer Network Marketing System (DIAMS) A Distributed Multi-Agent System for Collaborative Information Management and Sharing James R. Chen & Shawn R. Wolfe NASA Ames Research Center Mail Stop 269-2 Moffett Field, CA 94035-1000 {jchen, shawn}@ptolemy.arc.nasa.gov

More information

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

More information

Expert Finding Using Social Networking

Expert Finding Using Social Networking San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 1-1-2009 Expert Finding Using Social Networking Parin Shah San Jose State University Follow this and

More information

Using Wikipedia to Translate OOV Terms on MLIR

Using Wikipedia to Translate OOV Terms on MLIR Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Examples of Functions

Examples of Functions Examples of Functions In this document is provided examples of a variety of functions. The purpose is to convince the beginning student that functions are something quite different than polynomial equations.

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach

A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering

More information

Remote support for lab activities in educational institutions

Remote support for lab activities in educational institutions Remote support for lab activities in educational institutions Marco Mari 1, Agostino Poggi 1, Michele Tomaiuolo 1 1 Università di Parma, Dipartimento di Ingegneria dell'informazione 43100 Parma Italy {poggi,mari,tomamic}@ce.unipr.it,

More information

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement

An Experimental Study of the Performance of Histogram Equalization for Image Enhancement International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Special Issue-2, April 216 E-ISSN: 2347-2693 An Experimental Study of the Performance of Histogram Equalization

More information

Clinical Decision Support with the SPUD Language Model

Clinical Decision Support with the SPUD Language Model Clinical Decision Support with the SPUD Language Model Ronan Cummins The Computer Laboratory, University of Cambridge, UK ronan.cummins@cl.cam.ac.uk Abstract. In this paper we present the systems and techniques

More information

Information Retrieval Models

Information Retrieval Models Information Retrieval Models Djoerd Hiemstra University of Twente 1 Introduction author version Many applications that handle information on the internet would be completely inadequate without the support

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

WE DEFINE spam as an e-mail message that is unwanted basically

WE DEFINE spam as an e-mail message that is unwanted basically 1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir

More information

Considering Learning Styles in Learning Management Systems: Investigating the Behavior of Students in an Online Course*

Considering Learning Styles in Learning Management Systems: Investigating the Behavior of Students in an Online Course* Considering Learning Styles in Learning Management Systems: Investigating the Behavior of Students in an Online Course* Sabine Graf Vienna University of Technology Women's Postgraduate College for Internet

More information

DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENCY ANALYSIS

DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENCY ANALYSIS DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENCY ANALYSIS Rakhi Chakraborty Department of Computer Science & Engineering, Global Institute Of Management and Technology, Nadia,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service THE SELECTION OF RETURNS FOR AUDIT BY THE IRS John P. Hiniker, Internal Revenue Service BACKGROUND The Internal Revenue Service, hereafter referred to as the IRS, is responsible for administering the Internal

More information

New Metrics for Reputation Management in P2P Networks

New Metrics for Reputation Management in P2P Networks New for Reputation in P2P Networks D. Donato, M. Paniccia 2, M. Selis 2, C. Castillo, G. Cortesi 3, S. Leonardi 2. Yahoo!Research Barcelona Catalunya, Spain 2. Università di Roma La Sapienza Rome, Italy

More information

SIGIR 2004 Workshop: RIA and "Where can IR go from here?"

SIGIR 2004 Workshop: RIA and Where can IR go from here? SIGIR 2004 Workshop: RIA and "Where can IR go from here?" Donna Harman National Institute of Standards and Technology Gaithersburg, Maryland, 20899 donna.harman@nist.gov Chris Buckley Sabir Research, Inc.

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

8 Evaluating Search Engines

8 Evaluating Search Engines 8 Evaluating Search Engines Evaluation, Mr. Spock Captain Kirk, Star Trek: e Motion Picture 8.1 Why Evaluate? Evaluation is the key to making progress in building better search engines. It is also essential

More information

Discovering suffixes: A Case Study for Marathi Language

Discovering suffixes: A Case Study for Marathi Language Discovering suffixes: A Case Study for Marathi Language Mudassar M. Majgaonker Comviva Technologies Limited Gurgaon, India Abstract Suffix stripping is a pre-processing step required in a number of natural

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

More information

Fusion of Information Retrieval Engines (FIRE)

Fusion of Information Retrieval Engines (FIRE) Fusion of Information Retrieval Engines (FIRE) S.Alaoui Mounir, N. Goharian, M. Mahoney, A. Salem, O. Frieder Computer Science Department Florida Institute of Technology Melbourne, FL 32901 Abstract We

More information

Theme-based Retrieval of Web News

Theme-based Retrieval of Web News Theme-based Retrieval of Web Nuno Maria, Mário J. Silva DI/FCUL Faculdade de Ciências Universidade de Lisboa Campo Grande, Lisboa Portugal {nmsm, mjs}@di.fc.ul.pt ABSTRACT We introduce an information system

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

UMass at TREC 2008 Blog Distillation Task

UMass at TREC 2008 Blog Distillation Task UMass at TREC 2008 Blog Distillation Task Jangwon Seo and W. Bruce Croft Center for Intelligent Information Retrieval University of Massachusetts, Amherst Abstract This paper presents the work done for

More information

Efficient Recruitment and User-System Performance

Efficient Recruitment and User-System Performance Monitoring User-System Performance in Interactive Retrieval Tasks Liudmila Boldareva Arjen P. de Vries Djoerd Hiemstra University of Twente CWI Dept. of Computer Science INS1 PO Box 217 PO Box 94079 7500

More information

DEPARTMENT OF COMPUTER SCIENCE CORNELL UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE CORNELL UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CORNELL UNIVERSITY INFORMATION STORAGE AND RETRIEVAL Scientific Report No. ISR-11 to The National Science Foundation Ithaca, New York June 1966 Gerard Salton Project Director

More information

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information