Effective Information Retrieval System

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Effective Information Retrieval System"

Transcription

1 Effective Information Retrieval System Vidya Maurya 1, Preeti Pandey 2, L.S. Maurya 3 1 Student, 2 Assistant Professor, 3 Associate Professor, CS/IT Deptt. & SRMSWCET Bareilly, India Abstract-- This paper provides some perspective on the effectiveness of information retrieval that had its beginnings long before the creation of the Internet and provides some enlightened predictions on possible future directions of the field. The field of Information Retrieval (IR) was born in the 1950s out of this necessity. Over the last forty years, the field has matured considerably. This paper presents an outline of Effective Information Retrieval Systems seeking and searching, other aspects of information conflicting, showing the relationship between communication and information extraction in general with information seeking and information searching in information retrieval systems. It is also suggested that, within both information seeking research and information searching research, alternative information eliciting address similar issues in related ways and that the systems are complementary rather than conflicting. Finally, an alternative, problem-solving issues is presented, which, it is suggested, provides a basis for relating the design issues in appropriate research strategies. As we have learned how to handle text, through to the early adoption of computers to search for items that are relevant to a user s query. The advances achieved by information retrieval researchers from the 1950s through to the present day are detailed next, focusing on the process of locating relevant information. This paper closes with speculation on where the future of information retrieval lies. Keywords-- Enlightened predictions, Information conflicting, information elicitation, process of locating, speculation. I. INTRODUCTION For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. The field of Information Retrieval (IR) was born in the 1950s out of this necessity [7]. Information is an art and science, and the term system is an organized relationship among function in units or components. Now then information system is the science of locating, from a large documents collection, those documents that fulfill a specified information. Information Retrieval (IR) is finding material of an unstructured nature that satisfied a information need form within large collections. Actually, what is Information Retrieval? An information retrieval process begins when a user enters a query into the system, query are formal statements of information needs. In information retrieval a query does not uniquely identify a single object in the collection. Instead several objects may match the query, perhaps with different degrees of relevancy. 787 One of the most influential methods was described by H.P. Luhn in 1957, in which (put simply) he proposed using words as indexing units for documents and measuring word overlap as a criterion for retrieval. In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver). An information retrieval system does not informs (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and where about of documents relating to his request.' Much of information retrieval research is concerned with proposing and testing a methodology intended to perform this test. To perform such test it is necessary to make assumptions about the behaviour of users and the properties of text. Additionally methodology and tools for information gathering, require training and experience that the analyst is expected to have. This means that information gathering is neither easy nor routine. Much preparation, experience and training are required. Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. An automated information retrieval system is used to reduce what has been called Information Overload. It is only in the last decade and a half of the IEEE s 100 years that web search engines have become pervasive and search has become integrated into the fabric of desktop and mobile operating systems. The development of such systems also reflects a rapid progression away from manual library-based approaches of acquiring, indexing, and searching information to increasingly automated methods. Algorithm for stemming have been studied in computer science since The first ever published stemmer was written by Julie Beth Lovins in The original stemming algorithm paper was written in 1979 in the computer laboratory, Cambridge (England) as a part of larger. There are two basic assumptions of Information Retrieval: Collection and Goal. This system applied many where such as many universities and public library use IR system to provide access to books, journals, research paper and other documents, E.g. world wide web(www):consists web search engine, and world wide web worm*(wwww) [9]:consists a resource locator and search engine. Most IR system compute a numeric score on how well each object in the database matches the query and rank the object, according to this value. The top ranking objects are then shown to the user the process may then be iterated.

2 If the user wishes to refine the query. Language and text and their impact on IR are considered first, then examination of the interaction of users, their environment, and relevance. After everything is presented, a conclusion follows. II. MOTIVATION Information retrieval (IR) deals with the representation, storage, organization of data and access to information items. The representation and organization of the information items should provide the user with easy access to the information in which he is interested. Unfortunately, characterization of the user information need is not a simple problem. Consider, for instance, the following hypothetical user information need in the context of the World Wide Web (or just the Web) Clearly, this full description of the user information need cannot be used directly to request information using the current interfaces of Web search engines. Instead, the user must first translate this information need into a query which can be processed by the search engine (or IR system). In its most common form, this translation yields a set of keywords (or index terms) which summarizes the description of the user information need. III. STOPWORDS In computing, stop words are words which are filtered out prior to, or after, processing of natural language data (text). There is not one definite list of stopwords which all tools use [5], if even used, some tools specifically avoid removing them to support phrase search. Hans Peter Luhn, one of the pioneers in IR, iscredited with coining the phrase and using the concept in his design. Most search engine do not consider extremely common words in order to save disk space or to speed up search results. These filtered words are known as stopwords. Elimination of stopwords Reduce indexing size and processing time. Useless for retrieval. Occur in 80% documents. Configure and manage stop words and stop list for full text search----- To prevent a full text-index from becoming bloated, SQL server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings that do not help the search. These discarded strings are called, stopwords. During index creation, the full text engine omits stopwords. from the full- text index. This means that fulltext queries will not search on stopwords. Stopwords are words that, if indexed could potentially return every document in the database if the word was used in a search state space. IV. STEMMING In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem, base or root from-generally a written word form. Many search engines treats word with the same stem as synonyms as a kind of query boardenning, a process called conflation. Stemming is used to determine domain vocabularies in domain analysis. I t possible to evaluate stemming by counting the numbers of two kinds of error that occur during stemming, namely: Under Stemming & Over Stemming. Under Stemming This refers to the words that should be grouped together by stemming, but aren t. This cause a single concepts to be spread over various different systems, which will tend to decrease the recall in an IR search. Over Stemming This refers to the words that shouldn t be grouped together by stemming, but are. This cause the meaning of the system to be diluted, which will effects precision of IR. Techniqu es No. of initia l Wor ds 1,29 1 Domain based Corpus based TOTAL 2,02 8 Table 1. Improve Retrieval Performance No.of remov ed words No. of stopwor ds No.of comm on words No. of noncomm on words Some Stemming Algorithm Porter Stemming Algorithm: 860 1, = 922 The Porter Stemmer is a conflation Stemmer developed by Martin Porter at the University of Cambridge in The Porter stemming algorithm (or Porter stemmer ) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalization process that is usually done when setting up Information Retrieval systems. The Stemmer is based on the idea that the suffixes in the English language (approximately 1200) are mostly made up of a combination of smaller and simpler suffixes. This Stemmer is a linear step Stemmer. 788

3 Specifically it has five steps applying rules within each step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule are tested on what would be the resulting stem, if that The suffix was removed, in the way defined by the rule. For example such a condition may be, the number of vowel characters, which are followed be a consonant character in the stem (Measure), must be greater than one for the rule to be applied. The Porter Stemmer is a very widely used and available Stemmer, and is used in many applications. Implementations of this Stemmer are available at a website by Porter himself, with implementations in Java, C and PERL; the website also includes a copy of the paper defining the Algorithm. Other implementations of this algorithm are available from the Web. Porter's algorithm is probably the stemmer most widely used in IR research. Once a Rule passes its conditions and is accepted the rule fires and the suffix is removed and control moves to the next step. If the rule is not accepted then the next rule in the step is tested, until either a rule from that step fires and control passes to the next step or there are no more rules in that step whence control moves to the next step. This process continues for all steps, the resultant stem being returned by the Stemmer after control has been passed from steps, See figure1. Pre - processing step 2. Lovins Stemming Algorithm: Relevent document/sentences Remove stopwords Stem words Novelty detection Novel documents/sentences Figure 1. Stemming process The Lovins Stemmer is a single pass, contextsensitive, longest-match Stemmer developed by Julie Beth Lovins of Massachusetts Institute of Technology in This early stemmer was targeted at both the IR and Computational Linguistics areas of stemming. The Lovins Stemmer removes a maximum of one suffix from a word, due to its nature as single pass algorithm. It uses a list of about 250 different suffixes, and removes the longest suffix attached to the word, ensuring that the stem after the suffix has been removed is always at least 3 characters long. Then the ending of the stem may be reformed (e.g., by un-doubling a final consonant if applicable), by referring to a list of recoding transformations. J.B. Lovins, 1968: "Development of a stemming algorithm," Mechanical Translation and Computational Linguistics. This stemmer, though innovative for its time, has the problematic task of trying to please two masters (IR and Linguistics) and cannot excel at either. The approach does not excel with linguistics, as it is not complex enough to stem many suffixes due to their not being present in the rule list. This is interesting as Lovins rule list was derived by, processing and studying a word sample. Perhaps if this process was repeated with a much larger sample a more satisfactory rule list could be derived. There are also known to be problems regarding the reformation of words. This process uses the recoding rules to reform the stems into words to ensure they match stems of other similar meaning words. The main problem with this process is that it has been found to be highly unreliable and frequently fails to form words from the stems, or match the stems of like meaning words. The Stemmer does not excel from the IR viewpoint either, as its large rule set, and its recoding stage, affect its speed of execution. 3. Dawson s Stemming Algorithm The Dawson Stemmer was developed by J.L. Dawson of the Literary and Linguistics Computing Centre at Cambridge University. It is a complex linguistically targeted Stemmer that is strongly based upon the Lovins Stemmer, extending the suffix rule list to approximately 1200 suffixes. It keeps the longest match and single pass nature of Lovins, and replaces the recoding rules, which were found to be unreliable, using instead an extension of the partial matching procedure also defined within the Lovins Paper. J.L. Dawson, 1974: "Suffix removal for word conflation," Bulletin of the Association for Literary & Linguistic Computing The main objective of the stemming process is to remove all possible affixes and thus reduce the word to its stem (Dawson 1974). Using Stemming, many contemporary search engines associate words with prefixes and suffixes to their word stem, to make the search broader in the meaning that it can ensure that the greatest number of relevant matches is included in search results. Stemming has also applications in machine translation, document summarization (Orasan, Pekar & Hasler 2004, Dalianis 2000), and text classification (Gaustad & Bouma 2002). 789

4 Manual Conflation Method Automatic A thesaurus assembles groups of related specific terms under more general, higher level class indicators. Methods for generating complex index terms or term phrases automatically may be categorized as statistical, probabilistic or linguistics. Longest match Affix removal Successor variety Simple removal Figure 2. Suffixe removal Table lookup V. INDEXING WEB DOCUMENTS n-gram IR systems include two types of terms: objective and nonobjective. Objective terms are extrinsic to semantic content, and there is generally no disagreement about how to assign them. Examples include author name, document URL, and date of publication. Nonobjective terms, on the other hand, are intended to reflect the information manifested in the document, and there is no agreement about the choice or degree of applicability of these terms. Thus, they are also known as content terms. Indexing in general is concerned with assigning nonobjective terms to documents. The assignment may optionally include a weight indicating the extent to which the term represents or reflects the information content. The effectiveness of an indexing system is controlled by two main parameters. Indexing exhaustivity reflects the degree to which all the subject matter manifested in a document is actually recognized by the indexing system. When the indexing system is exhaustive, it generates a large number of terms to reflect all aspects of the subject matter present in the document; when it is non exhaustive, it generates fewer terms, corresponding to the major subjects in the document. Term specificity refers to the breadth of the terms used for indexing. 2 Broad terms retrieve many useful documents along with a significant number of irrelevant ones; narrow terms retrieve fewer documents and may miss some relevant items. The effect of indexing exhaustivity and term specificity on retrieval effectiveness can be explained by two parameters used for many years in IR problem. Multi-term or phrase indexing: Single terms are less than ideal for an indexing scheme because their meanings out of context are often ambiguous. Term phrases, on the other hand, carry more specific meaning and thus have more discriminating power. Phrase generation is intended to improve precision; thesaurus-group generation is expected to improve recall. VI. EVALUATION Objective evaluation of search effectiveness has been a cornerstone of IR [4]. Progress in the field critically depends upon experimenting with new ideas and evaluating the effects of these ideas, especially given the experimental nature of the field. Since the early years, it was evident to researchers in the community that objective evaluation of search techniques would play a key role in the field. The Cranfield tests, conducted in 1960s, established the desired set of characteristics for a retrieval system [1]. Even though there has been some debate over the years, the two desired properties that have been accepted by the research community for measurement of search effectiveness are recall: the proportion of relevant documents retrieved by the system; and precision: the proportion of retrieved documents that are relevant. VII. RETRIEVAL EFFECTIVENESS ASSESSMENT The formal precision and recall measures used to quantify retrieval effectiveness of IR systems are based on evaluation experiments conducted under controlled conditions [10]. This requires a test bed comprising a fixed number of documents, a standard set of queries, and relevant and irrelevant documents in the test bed for each query. Realizing such experimental conditions in the Web context is extremely difficult. Search engines operate on different indexes, and the indexes differ in their coverage of Web documents. We must therefore compare retrieval effectiveness in terms of qualitative statements and the number of documents retrieved. We evaluated various search tools and services using two queries: latex software and multi agent system architecture. The first query was intended to find both public-domain sources and commercial vendors for obtaining LaTex software, whereas the second query was intended to locate relevant research publications on multi agent system architecture. Table 1 presents results for the first query. The second column indicates the number of documents retrieved by interpreting the query as a disjunction of the query terms. VIII. IMPROVING RETRIEVAL EFFECTIVENESS The design and development of current-generation Web search tools have focused on query-processing speed and database size. 790

5 This is largely a response to the lack of features in the original Hyper Text Markup Language for representing document content to search tools5,6 (not surprising, given HTML s original purpose: to render documents on a wide array of output devices without concern for the computer to which the device was connected). HTML Version 3 introduced the META tag, which allows authors to specify indexing information. We expect this trend to continue, establishing standardized tags for Web document content. Meanwhile, as we have seen, the query representation. The first, modification of term weights, involves adjusting the query term weights by adding document vectors in the positive feedback set to the query vector. Optionally, negative feedback can be used to subtract document vectors in the negative feedback set from the query vector. The reformulated query should retrieve additional relevant documents similar to the documents in the positive feedback set. This process can be carried out iteratively until the user is satisfied with the quality and number of relevant documents in the query output. IX. TECHNIQUES FOR IMPROVING IR EFFECTIVENESS Interaction with user (relevance feedback) - Keywords only cover part of the contents - User can help by indicating relevant/irrelevant document The use of relevance feedback To improve query expression: Q new = *Q old + *Rel_d - *Nrel_d where Rel_d = centroid of relevant documents NRel_d = centroid of non-relevant documents Data retrieval, in the context of an IR system, consists mainly of determining which documents of a collection contain the keywords in the user query which, most frequently, is not enough to satisfy the user information need. In fact, the user of an IR system is concerned more with retrieving information about a subject than with retrieving data which satisfies a given query. A data retrieval language aims at retrieving all objects which satisfy clearly defined conditions such as those in a regular expression or in a relational algebra expression. Thus, for a data retrieval system, a single erroneous object among a thousand retrieved objects means total failure. For an information retrieval system, however, the retrieved objects might be inaccurate and small errors are likely to go unnoticed. The main reason for this difference is that information retrieval usually deals with natural language text which is not always well structured and could be semantically ambiguous. On the other hand, a data retrieval system (such as a relational database) deals with data that has a well defined structure and semantics. One may want to criticise this dichotomy on the grounds that the boundary between the two is a vague one. 791 And so it is, but it is a useful one in that it illustrates the range of complexity associated with each mode of retrieval. Let us now take each item in the table in turn and look at it more closely. In data retrieval we are normally looking for an exact match, that is, we are checking to see whether an item is or is not present in the file. In information retrieval this may sometimes be of interest but more generally we want to find those items which partially match the request and then select from those a few of the best matching ones. The inference used in data retrieval is of the simple deductive kind, that is, arb and brc then arc. In information retrieval it is far more common to use inductive inference; relations are only specified with a degree of certainty or uncertainty and hence our confidence in the inference is variable. This distinction leads one to describe data retrieval as deterministic but information retrieval as probabilistic. Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing. X. INFORMATION VERSUS DATA RETRIEVAL Data retrieval, in the context of an IR system, consists mainly of determining which documents of a collection contain the keywords in the user query which, most frequently, is not enough to satisfy the user information need. In fact, the user of an IR system is concerned more with retrieving information about a subject than with retrieving data which satisfies a given query [3]. A data retrieval language aims at retrieving all objects which satisfy clearly defined conditions such as those in a regular expression or in a relational algebra expression. Thus, for a data retrieval system, a single erroneous object among a thousand retrieved objects means total failure. For an information retrieval system, however, the retrieved objects might be inaccurate and small errors are likely to go unnoticed. The main reason for this difference is that information retrieval usually deals with natural language text which is not always well structured and could be semantically ambiguous. On the other hand, a data retrieval system (such as a relational database) deals with data that has a well defined structure and semantics. One may want to criticise this dichotomy on the grounds that the boundary between the two is a vague one. And so it is, but it is a useful one in that it illustrates the range of complexity associated with each mode of retrieval. Let us now take each item in the table in turn and look at it more closely. In data retrieval we are normally looking for an exact match, that is, we are checking to see whether an item is or is not present in the file. In information retrieval this may sometimes be of interest but more generally we want to find those items which partially match the request and then select from those a few of the best matching ones.

6 The inference used in data retrieval is of the simple deductive kind, that is, arb and brc then arc. In information retrieval it is far more common to use inductive inference; relations are only specified with a degree of certainty or uncertainty and hence our confidence in the inference is variable. This distinction leads one to describe data retrieval as deterministic but information retrieval as probabilistic [12]. Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing. S.No Table 1.1 DIFFERENCE BETWEEN DATA RETRIEVAL AND INFORMATION RETRIEVALs Characteristics Data Retrieval (DR) Information Retrieval (IR) 1 Matching Exact match Partial match, best match 2 Inference Deduction Induction 3 Model Deterministic Probabilistic 4 Classification Monothetic Polythetic 5 Query language Artificial Natural 6 Query Complete Incomplete specification 7 Items wanted Matching Relevant 8 Error response Sensitive Insensitive XI. OTHER TECHNIQUES AND APPLICATIONS Many other techniques have been developed over the years and have met with varying success. Cluster hypothesis states that documents that cluster together (are very similar to each other) will have a similar relevance profile for a given query. Document clustering techniques were (and still are) an active area of research. Even though the usefulness of document clustering for improved search effectiveness (or efficiency) has been very limited, document clustering has allowed several developments in IR, e.g., for browsing and search interfaces. Natural Language Processing (NLP) has also been proposed as a tool to enhance retrieval effectiveness [8], but has had very limited success. Even though document ranking is a critical application for IR [2], it is definitely not the only one. The field has developed techniques to attack many different problems like information filtering, topic detection and tracking (or TDT), speech retrieval, cross-language retrieval, question answering, and many more. XII. CONCLUSION The field of information retrieval has come a long way in the last sixty-eight years, and has enabled easier and faster information discovery. In the early years there were many doubts raised regarding the simple statistical techniques used in the field. 792 However, for the task of finding information, these statistical techniques have indeed proven to be the most effective ones so far. Techniques developed in the field have been used in many other areas and have yielded many new technologies which are used by people on an everyday basis, e.g., web search engines, junk- filters, news clipping services. Going forward, the field is attacking many critical problems that users face in today information-ridden world. With exponential growth in the amount of information available, information retrieval will play an increasingly important role in future. There exist some future aspects like significant quality improvements still a tedious and difficult task, need more research requires and close cooperation. Acknowledgment The authors are grateful to the anonymous refers whose suggestions have significantly improved the clarity and content of this article. I would like to express our sincere gratitude towards our Chairman Shri Dev Murti, Principal T. D. Bhist, and Miss Manvi Mishra (Head of Deptt.). Without these members this manuscript is not possible. Under their cooperation we are able to make this manuscript successful. We are also thankful to our Assistant Prof. Preeti Pandey She helps me throughout this research paper. Last but not the least we are heartily thankful to my institution which bestowed this opportunity to me. REFERENCES [1 ] LANCASTER, F.W., Information Retrieval Systems: Characteristics and Evaluation, Wiley, New York (1968). [2 ] BAR-HILLEL, Y., Language and Information. Selected Essays on their Theory and Application, Addison-Wesley, Reading, Massachusetts (1964). [3 ] BARBER, A.S., BARRACLOUGH, E.D. and GRAY, W.A. 'Online information retrieval, Information Storage and Retrieval, 9, (1973). [4 ] CLEVERDON, C.W.,Evaluation of information retrieval systems', Journal of Documentation, 26, 55-67, (1970). [5 ] Stopwords Algorithm: webconfs.com/stopwords.php. [6 ] Dawson s Algorithm: or [7 ] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, [8 ] T. Strzalkowski, L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J.Wang, and J. Wilding. Natural language information retrieval: TREC-5 report. In Proceedings of the Fifth Text Retrieval Conference (TREC-5), [9 ] VENKAT N. GUDIVADA Dow Jones Markets, VIJAY V. RAGHAVAN University of Southwestern Louisiana, Information retrieval on the world wide web. Ieee internet computing,1991. [10 ] WILLIAM I. GROSKY Wayne State University, RAJESH KASANAGOTTU University of Missouri, Effective search and retrieval are enabling technologies for realizing the full potential of the web. [11 ] For figure, images, diagrams, tables: [12 ] (van Rijsbergen, C.J. (1979)

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Tamil Search Engine. Abstract

Tamil Search Engine. Abstract Tamil Search Engine Baskaran Sankaran AU-KBC Research Centre, MIT campus of Anna University, Chromepet, Chennai - 600 044. India. E-mail: baskaran@au-kbc.org Abstract The Internet marks the era of Information

More information

Review: Information Retrieval Techniques and Applications

Review: Information Retrieval Techniques and Applications International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373 377 Available online at: www.ijcncs.org E-ISSN 2308-9830 (Online) / ISSN 2410-0595 (Print) Review:

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Modern Information Retrieval: A Brief Overview

Modern Information Retrieval: A Brief Overview Modern Information Retrieval: A Brief Overview Amit Singhal Google, Inc. singhal@google.com Abstract For thousands of years people have realized the importance of archiving and finding information. With

More information

INFORMATION LOGISTICS VERSUS SEARCH. How context-sensitive information retrieval saves time spent reaching goals

INFORMATION LOGISTICS VERSUS SEARCH. How context-sensitive information retrieval saves time spent reaching goals INFORMATION LOGISTICS VERSUS SEARCH How context-sensitive information retrieval saves time spent reaching goals 2 Information logictics versus search Table of contents Page Topic 3 Search 3 Basic methodology

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Information Retrieval Support Systems

Information Retrieval Support Systems 1 Information Retrieval Support Systems Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract - Information retrieval support

More information

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

More information

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines.

Web Content Mining. Search Engine Mining Improves on the content search of other tools like search engines. Web Content Mining Web Content Mining Pre-processing data before web content mining: feature selection Post-processing data can reduce ambiguous searching results Web Page Content Mining Mines the contents

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Information Need Assessment in Information Retrieval

Information Need Assessment in Information Retrieval Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information

More information

Automatic Web Page Classification

Automatic Web Page Classification Automatic Web Page Classification Yasser Ganjisaffar 84802416 yganjisa@uci.edu 1 Introduction To facilitate user browsing of Web, some websites such as Yahoo! (http://dir.yahoo.com) and Open Directory

More information

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

A statistical interpretation of term specificity and its application in retrieval

A statistical interpretation of term specificity and its application in retrieval Reprinted from Journal of Documentation Volume 60 Number 5 2004 pp. 493-502 Copyright MCB University Press ISSN 0022-0418 and previously from Journal of Documentation Volume 28 Number 1 1972 pp. 11-21

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

Information Retrieval. Lecture 3: Evaluation methodology

Information Retrieval. Lecture 3: Evaluation methodology Information Retrieval Lecture 3: Evaluation methodology Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk Today 2. General concepts

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Information Retrieval System Assigning Context to Documents by Relevance Feedback

Information Retrieval System Assigning Context to Documents by Relevance Feedback Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues

Phrases. Topics for Today. Phrases. POS Tagging. ! Text transformation. ! Text processing issues Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Terminology Retrieval: towards a synergy between thesaurus and free text searching

Terminology Retrieval: towards a synergy between thesaurus and free text searching Terminology Retrieval: towards a synergy between thesaurus and free text searching Anselmo Peñas, Felisa Verdejo and Julio Gonzalo Dpto. Lenguajes y Sistemas Informáticos, UNED {anselmo,felisa,julio}@lsi.uned.es

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Automated classification of A/E/C web content

Automated classification of A/E/C web content Automated classification of A/E/C web content R. Amor & K. Xu Department of Computer Science, University of Auckland, Auckland, New Zealand ABSTRACT: The amount of useful information available on the web

More information

Study and Analysis of Data Mining Concepts

Study and Analysis of Data Mining Concepts Study and Analysis of Data Mining Concepts M.Parvathi Head/Department of Computer Applications Senthamarai college of Arts and Science,Madurai,TamilNadu,India/ Dr. S.Thabasu Kannan Principal Pannai College

More information

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System

Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Mavuno: A Scalable and Effective Hadoop-Based Paraphrase Acquisition System Donald Metzler and Eduard Hovy Information Sciences Institute University of Southern California Overview Mavuno Paraphrases 101

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Binary and Ranked Retrieval

Binary and Ranked Retrieval Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return under-dimensioned output

More information

INFORMATION RETRIEVAL

INFORMATION RETRIEVAL INFORMATION RETRIEVAL C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S. Department of Computing Science University of Glasgow PREFACE TO THE SECOND EDITION The major change in the second edition of this book

More information

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^ 1-1 I. The SMART Project - Status Report and Plans G. Salton 1. Introduction The SMART document retrieval system has been operating on a 709^ computer since the end of 1964. The system takes documents

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Optimization Based Data Mining in Business Research

Optimization Based Data Mining in Business Research Optimization Based Data Mining in Business Research Praveen Gujjar J 1, Dr. Nagaraja R 2 Lecturer, Department of ISE, PESITM, Shivamogga, Karnataka, India 1,2 ABSTRACT: Business research is a process of

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

A Web Prefetching Model Based on Content Analysis

A Web Prefetching Model Based on Content Analysis A Web Prefetching Model Based on Content Analysis O Kit Hong, Fiona Robert P. Biuk-Aghai Faculty of Science and Technology University of Macau {csb.fiona fst.robert}@umac.mo Abstract Web-accessible resources

More information

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING

EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING EFFICIENT K-MEANS CLUSTERING ALGORITHM USING RANKING METHOD IN DATA MINING Navjot Kaur, Jaspreet Kaur Sahiwal, Navneet Kaur Lovely Professional University Phagwara- Punjab Abstract Clustering is an essential

More information

Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines

Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines Parul Gupta Department of Computer Science and Engineering Y.M.C.A. University of Science and Technology Faridabad

More information

Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5).

Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5). Information Retrieval Design. James D. Anderson and Jose Perez-Carballo. East Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5). Information Retrieval Design is a textbook that

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

Data Discovery on the Information Highway

Data Discovery on the Information Highway Data Discovery on the Information Highway Susan Gauch Introduction Information overload on the Web Many possible search engines Need intelligent help to select best information sources customize results

More information

Introduction. Chapter 1. 1.1 Introduction. 1.2 Background

Introduction. Chapter 1. 1.1 Introduction. 1.2 Background Chapter 1 Introduction 1.1 Introduction This report is the fifth in the series describing experiments with the British Library Research & Development Department's (BLR&DD) Okapi bibliographic information

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

SEO AND CONTENT MANAGEMENT SYSTEM

SEO AND CONTENT MANAGEMENT SYSTEM International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

Fourth generation techniques (4GT)

Fourth generation techniques (4GT) Fourth generation techniques (4GT) The term fourth generation techniques (4GT) encompasses a broad array of software tools that have one thing in common. Each enables the software engineer to specify some

More information

CLUSTERING FOR FORENSIC ANALYSIS

CLUSTERING FOR FORENSIC ANALYSIS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Building well-balanced CDN 1

Building well-balanced CDN 1 Proceedings of the Federated Conference on Computer Science and Information Systems pp. 679 683 ISBN 978-83-60810-51-4 Building well-balanced CDN 1 Piotr Stapp, Piotr Zgadzaj Warsaw University of Technology

More information

Introduction to Manual Annotation

Introduction to Manual Annotation Introduction to Manual Annotation This document introduces the concept of annotations, their uses and the common types of manual annotation projects. This is a supplement to project-specific guidelines

More information

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases

More information

Automatic Word Lookup Service and Client Tool for SAIKAM Online Dictionary

Automatic Word Lookup Service and Client Tool for SAIKAM Online Dictionary NII Journal No.1 (2000.12) 研 究 論 文 Automatic Word Lookup Service and Client Tool for SAIKAM Online Dictionary Vuthichai AMPORNARAMVETH National Institute of Informatics Akiko AIZAWA National Institute

More information

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

Fast and Easy Delivery of Data Mining Insights to Reporting Systems Fast and Easy Delivery of Data Mining Insights to Reporting Systems Ruben Pulido, Christoph Sieb rpulido@de.ibm.com, christoph.sieb@de.ibm.com Abstract: During the last decade data mining and predictive

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

CONCEPTCLASSIFIER FOR SHAREPOINT

CONCEPTCLASSIFIER FOR SHAREPOINT CONCEPTCLASSIFIER FOR SHAREPOINT PRODUCT OVERVIEW The only SharePoint 2007 and 2010 solution that delivers automatic conceptual metadata generation, auto-classification and powerful taxonomy tools running

More information

Spam Filtering with Naive Bayesian Classification

Spam Filtering with Naive Bayesian Classification Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011

More information

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation*

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* Valerie Barr Department of Computer

More information

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

More information

Optimizing Search Engines using Clickthrough Data

Optimizing Search Engines using Clickthrough Data Optimizing Search Engines using Clickthrough Data Presented by - Kajal Miyan Seminar Series, 891 Michigan state University *Slides adopted from presentations of Thorsten Joachims (author) and Shui-Lung

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

Discovering suffixes: A Case Study for Marathi Language

Discovering suffixes: A Case Study for Marathi Language Discovering suffixes: A Case Study for Marathi Language Mudassar M. Majgaonker Comviva Technologies Limited Gurgaon, India Abstract Suffix stripping is a pre-processing step required in a number of natural

More information

Administrator s Guide

Administrator s Guide SEO Toolkit 1.3.0 for Sitecore CMS 6.5 Administrator s Guide Rev: 2011-06-07 SEO Toolkit 1.3.0 for Sitecore CMS 6.5 Administrator s Guide How to use the Search Engine Optimization Toolkit to optimize your

More information

Modeling Concept and Context to Improve Performance in ediscovery

Modeling Concept and Context to Improve Performance in ediscovery By: H. S. Hyman, ABD, University of South Florida Warren Fridy III, MS, Fridy Enterprises Abstract One condition of ediscovery making it unique from other, more routine forms of IR is that all documents

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

types of information systems computer-based information systems

types of information systems computer-based information systems topics: what is information systems? what is information? knowledge representation information retrieval cis20.2 design and implementation of software applications II spring 2008 session # II.1 information

More information

A QoS-Aware Web Service Selection Based on Clustering

A QoS-Aware Web Service Selection Based on Clustering International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Precision and Relative Recall of Search Engines: A Comparative Study of Google and Yahoo

Precision and Relative Recall of Search Engines: A Comparative Study of Google and Yahoo and Relative Recall of Engines: A Comparative Study of Google and Yahoo B.T. Sampath Kumar J.N. Prakash Kuvempu University Abstract This paper compared the retrieval effectiveness of the Google and Yahoo.

More information

Research and Implementation of Real-time Automatic Web Page Classification System

Research and Implementation of Real-time Automatic Web Page Classification System 3rd International Conference on Material, Mechanical and Manufacturing Engineering (IC3ME 2015) Research and Implementation of Real-time Automatic Web Page Classification System Weihong Han 1, a *, Weihui

More information

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew

More information

Embedded Systems Programming in a Private Cloud- A prototype for Embedded Cloud Computing

Embedded Systems Programming in a Private Cloud- A prototype for Embedded Cloud Computing International Journal of Information Science and Intelligent System, Vol. 2, No.4, 2013 Embedded Systems Programming in a Private Cloud- A prototype for Embedded Cloud Computing Achin Mishra 1 1 Department

More information

Healthcare, transportation,

Healthcare, transportation, Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING Critical and creative thinking (higher order thinking) refer to a set of cognitive skills or strategies that increases the probability of a desired outcome. In an information- rich society, the quality

More information

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt>

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt> Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.

More information

A Content-Based Image Meta-Search Engine using Relevance Feedback

A Content-Based Image Meta-Search Engine using Relevance Feedback A Content-Based Image Meta-Search Engine using Relevance Feedback Ana B. Benitez, Mandis Beigi, and Shih-Fu Chang Department of Electrical Engineering & New Media Technology Center Columbia University

More information

Expert Finding Using Social Networking

Expert Finding Using Social Networking San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 1-1-2009 Expert Finding Using Social Networking Parin Shah San Jose State University Follow this and

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information