Effective Information Retrieval System

Size: px
Start display at page:

Download "Effective Information Retrieval System"

Transcription

1 Effective Information Retrieval System Vidya Maurya 1, Preeti Pandey 2, L.S. Maurya 3 1 Student, 2 Assistant Professor, 3 Associate Professor, CS/IT Deptt. & SRMSWCET Bareilly, India Abstract-- This paper provides some perspective on the effectiveness of information retrieval that had its beginnings long before the creation of the Internet and provides some enlightened predictions on possible future directions of the field. The field of Information Retrieval (IR) was born in the 1950s out of this necessity. Over the last forty years, the field has matured considerably. This paper presents an outline of Effective Information Retrieval Systems seeking and searching, other aspects of information conflicting, showing the relationship between communication and information extraction in general with information seeking and information searching in information retrieval systems. It is also suggested that, within both information seeking research and information searching research, alternative information eliciting address similar issues in related ways and that the systems are complementary rather than conflicting. Finally, an alternative, problem-solving issues is presented, which, it is suggested, provides a basis for relating the design issues in appropriate research strategies. As we have learned how to handle text, through to the early adoption of computers to search for items that are relevant to a user s query. The advances achieved by information retrieval researchers from the 1950s through to the present day are detailed next, focusing on the process of locating relevant information. This paper closes with speculation on where the future of information retrieval lies. Keywords-- Enlightened predictions, Information conflicting, information elicitation, process of locating, speculation. I. INTRODUCTION For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding useful information from such collections became a necessity. The field of Information Retrieval (IR) was born in the 1950s out of this necessity [7]. Information is an art and science, and the term system is an organized relationship among function in units or components. Now then information system is the science of locating, from a large documents collection, those documents that fulfill a specified information. Information Retrieval (IR) is finding material of an unstructured nature that satisfied a information need form within large collections. Actually, what is Information Retrieval? An information retrieval process begins when a user enters a query into the system, query are formal statements of information needs. In information retrieval a query does not uniquely identify a single object in the collection. Instead several objects may match the query, perhaps with different degrees of relevancy. 787 One of the most influential methods was described by H.P. Luhn in 1957, in which (put simply) he proposed using words as indexing units for documents and measuring word overlap as a criterion for retrieval. In the context of information retrieval (IR), information, in the technical meaning given in Shannon's theory of communication, is not readily measured (Shannon and Weaver). An information retrieval system does not informs (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non-existence) and where about of documents relating to his request.' Much of information retrieval research is concerned with proposing and testing a methodology intended to perform this test. To perform such test it is necessary to make assumptions about the behaviour of users and the properties of text. Additionally methodology and tools for information gathering, require training and experience that the analyst is expected to have. This means that information gathering is neither easy nor routine. Much preparation, experience and training are required. Information Retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. An automated information retrieval system is used to reduce what has been called Information Overload. It is only in the last decade and a half of the IEEE s 100 years that web search engines have become pervasive and search has become integrated into the fabric of desktop and mobile operating systems. The development of such systems also reflects a rapid progression away from manual library-based approaches of acquiring, indexing, and searching information to increasingly automated methods. Algorithm for stemming have been studied in computer science since The first ever published stemmer was written by Julie Beth Lovins in The original stemming algorithm paper was written in 1979 in the computer laboratory, Cambridge (England) as a part of larger. There are two basic assumptions of Information Retrieval: Collection and Goal. This system applied many where such as many universities and public library use IR system to provide access to books, journals, research paper and other documents, E.g. world wide web(www):consists web search engine, and world wide web worm*(wwww) [9]:consists a resource locator and search engine. Most IR system compute a numeric score on how well each object in the database matches the query and rank the object, according to this value. The top ranking objects are then shown to the user the process may then be iterated.

2 If the user wishes to refine the query. Language and text and their impact on IR are considered first, then examination of the interaction of users, their environment, and relevance. After everything is presented, a conclusion follows. II. MOTIVATION Information retrieval (IR) deals with the representation, storage, organization of data and access to information items. The representation and organization of the information items should provide the user with easy access to the information in which he is interested. Unfortunately, characterization of the user information need is not a simple problem. Consider, for instance, the following hypothetical user information need in the context of the World Wide Web (or just the Web) Clearly, this full description of the user information need cannot be used directly to request information using the current interfaces of Web search engines. Instead, the user must first translate this information need into a query which can be processed by the search engine (or IR system). In its most common form, this translation yields a set of keywords (or index terms) which summarizes the description of the user information need. III. STOPWORDS In computing, stop words are words which are filtered out prior to, or after, processing of natural language data (text). There is not one definite list of stopwords which all tools use [5], if even used, some tools specifically avoid removing them to support phrase search. Hans Peter Luhn, one of the pioneers in IR, iscredited with coining the phrase and using the concept in his design. Most search engine do not consider extremely common words in order to save disk space or to speed up search results. These filtered words are known as stopwords. Elimination of stopwords Reduce indexing size and processing time. Useless for retrieval. Occur in 80% documents. Configure and manage stop words and stop list for full text search----- To prevent a full text-index from becoming bloated, SQL server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings that do not help the search. These discarded strings are called, stopwords. During index creation, the full text engine omits stopwords. from the full- text index. This means that fulltext queries will not search on stopwords. Stopwords are words that, if indexed could potentially return every document in the database if the word was used in a search state space. IV. STEMMING In linguistic morphology and information retrieval, stemming is the process for reducing inflected words to their stem, base or root from-generally a written word form. Many search engines treats word with the same stem as synonyms as a kind of query boardenning, a process called conflation. Stemming is used to determine domain vocabularies in domain analysis. I t possible to evaluate stemming by counting the numbers of two kinds of error that occur during stemming, namely: Under Stemming & Over Stemming. Under Stemming This refers to the words that should be grouped together by stemming, but aren t. This cause a single concepts to be spread over various different systems, which will tend to decrease the recall in an IR search. Over Stemming This refers to the words that shouldn t be grouped together by stemming, but are. This cause the meaning of the system to be diluted, which will effects precision of IR. Techniqu es No. of initia l Wor ds 1,29 1 Domain based Corpus based TOTAL 2,02 8 Table 1. Improve Retrieval Performance No.of remov ed words No. of stopwor ds No.of comm on words No. of noncomm on words Some Stemming Algorithm Porter Stemming Algorithm: 860 1, = 922 The Porter Stemmer is a conflation Stemmer developed by Martin Porter at the University of Cambridge in The Porter stemming algorithm (or Porter stemmer ) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalization process that is usually done when setting up Information Retrieval systems. The Stemmer is based on the idea that the suffixes in the English language (approximately 1200) are mostly made up of a combination of smaller and simpler suffixes. This Stemmer is a linear step Stemmer. 788

3 Specifically it has five steps applying rules within each step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule are tested on what would be the resulting stem, if that The suffix was removed, in the way defined by the rule. For example such a condition may be, the number of vowel characters, which are followed be a consonant character in the stem (Measure), must be greater than one for the rule to be applied. The Porter Stemmer is a very widely used and available Stemmer, and is used in many applications. Implementations of this Stemmer are available at a website by Porter himself, with implementations in Java, C and PERL; the website also includes a copy of the paper defining the Algorithm. Other implementations of this algorithm are available from the Web. Porter's algorithm is probably the stemmer most widely used in IR research. Once a Rule passes its conditions and is accepted the rule fires and the suffix is removed and control moves to the next step. If the rule is not accepted then the next rule in the step is tested, until either a rule from that step fires and control passes to the next step or there are no more rules in that step whence control moves to the next step. This process continues for all steps, the resultant stem being returned by the Stemmer after control has been passed from steps, See figure1. Pre - processing step 2. Lovins Stemming Algorithm: Relevent document/sentences Remove stopwords Stem words Novelty detection Novel documents/sentences Figure 1. Stemming process The Lovins Stemmer is a single pass, contextsensitive, longest-match Stemmer developed by Julie Beth Lovins of Massachusetts Institute of Technology in This early stemmer was targeted at both the IR and Computational Linguistics areas of stemming. The Lovins Stemmer removes a maximum of one suffix from a word, due to its nature as single pass algorithm. It uses a list of about 250 different suffixes, and removes the longest suffix attached to the word, ensuring that the stem after the suffix has been removed is always at least 3 characters long. Then the ending of the stem may be reformed (e.g., by un-doubling a final consonant if applicable), by referring to a list of recoding transformations. J.B. Lovins, 1968: "Development of a stemming algorithm," Mechanical Translation and Computational Linguistics. This stemmer, though innovative for its time, has the problematic task of trying to please two masters (IR and Linguistics) and cannot excel at either. The approach does not excel with linguistics, as it is not complex enough to stem many suffixes due to their not being present in the rule list. This is interesting as Lovins rule list was derived by, processing and studying a word sample. Perhaps if this process was repeated with a much larger sample a more satisfactory rule list could be derived. There are also known to be problems regarding the reformation of words. This process uses the recoding rules to reform the stems into words to ensure they match stems of other similar meaning words. The main problem with this process is that it has been found to be highly unreliable and frequently fails to form words from the stems, or match the stems of like meaning words. The Stemmer does not excel from the IR viewpoint either, as its large rule set, and its recoding stage, affect its speed of execution. 3. Dawson s Stemming Algorithm The Dawson Stemmer was developed by J.L. Dawson of the Literary and Linguistics Computing Centre at Cambridge University. It is a complex linguistically targeted Stemmer that is strongly based upon the Lovins Stemmer, extending the suffix rule list to approximately 1200 suffixes. It keeps the longest match and single pass nature of Lovins, and replaces the recoding rules, which were found to be unreliable, using instead an extension of the partial matching procedure also defined within the Lovins Paper. J.L. Dawson, 1974: "Suffix removal for word conflation," Bulletin of the Association for Literary & Linguistic Computing The main objective of the stemming process is to remove all possible affixes and thus reduce the word to its stem (Dawson 1974). Using Stemming, many contemporary search engines associate words with prefixes and suffixes to their word stem, to make the search broader in the meaning that it can ensure that the greatest number of relevant matches is included in search results. Stemming has also applications in machine translation, document summarization (Orasan, Pekar & Hasler 2004, Dalianis 2000), and text classification (Gaustad & Bouma 2002). 789

4 Manual Conflation Method Automatic A thesaurus assembles groups of related specific terms under more general, higher level class indicators. Methods for generating complex index terms or term phrases automatically may be categorized as statistical, probabilistic or linguistics. Longest match Affix removal Successor variety Simple removal Figure 2. Suffixe removal Table lookup V. INDEXING WEB DOCUMENTS n-gram IR systems include two types of terms: objective and nonobjective. Objective terms are extrinsic to semantic content, and there is generally no disagreement about how to assign them. Examples include author name, document URL, and date of publication. Nonobjective terms, on the other hand, are intended to reflect the information manifested in the document, and there is no agreement about the choice or degree of applicability of these terms. Thus, they are also known as content terms. Indexing in general is concerned with assigning nonobjective terms to documents. The assignment may optionally include a weight indicating the extent to which the term represents or reflects the information content. The effectiveness of an indexing system is controlled by two main parameters. Indexing exhaustivity reflects the degree to which all the subject matter manifested in a document is actually recognized by the indexing system. When the indexing system is exhaustive, it generates a large number of terms to reflect all aspects of the subject matter present in the document; when it is non exhaustive, it generates fewer terms, corresponding to the major subjects in the document. Term specificity refers to the breadth of the terms used for indexing. 2 Broad terms retrieve many useful documents along with a significant number of irrelevant ones; narrow terms retrieve fewer documents and may miss some relevant items. The effect of indexing exhaustivity and term specificity on retrieval effectiveness can be explained by two parameters used for many years in IR problem. Multi-term or phrase indexing: Single terms are less than ideal for an indexing scheme because their meanings out of context are often ambiguous. Term phrases, on the other hand, carry more specific meaning and thus have more discriminating power. Phrase generation is intended to improve precision; thesaurus-group generation is expected to improve recall. VI. EVALUATION Objective evaluation of search effectiveness has been a cornerstone of IR [4]. Progress in the field critically depends upon experimenting with new ideas and evaluating the effects of these ideas, especially given the experimental nature of the field. Since the early years, it was evident to researchers in the community that objective evaluation of search techniques would play a key role in the field. The Cranfield tests, conducted in 1960s, established the desired set of characteristics for a retrieval system [1]. Even though there has been some debate over the years, the two desired properties that have been accepted by the research community for measurement of search effectiveness are recall: the proportion of relevant documents retrieved by the system; and precision: the proportion of retrieved documents that are relevant. VII. RETRIEVAL EFFECTIVENESS ASSESSMENT The formal precision and recall measures used to quantify retrieval effectiveness of IR systems are based on evaluation experiments conducted under controlled conditions [10]. This requires a test bed comprising a fixed number of documents, a standard set of queries, and relevant and irrelevant documents in the test bed for each query. Realizing such experimental conditions in the Web context is extremely difficult. Search engines operate on different indexes, and the indexes differ in their coverage of Web documents. We must therefore compare retrieval effectiveness in terms of qualitative statements and the number of documents retrieved. We evaluated various search tools and services using two queries: latex software and multi agent system architecture. The first query was intended to find both public-domain sources and commercial vendors for obtaining LaTex software, whereas the second query was intended to locate relevant research publications on multi agent system architecture. Table 1 presents results for the first query. The second column indicates the number of documents retrieved by interpreting the query as a disjunction of the query terms. VIII. IMPROVING RETRIEVAL EFFECTIVENESS The design and development of current-generation Web search tools have focused on query-processing speed and database size. 790

5 This is largely a response to the lack of features in the original Hyper Text Markup Language for representing document content to search tools5,6 (not surprising, given HTML s original purpose: to render documents on a wide array of output devices without concern for the computer to which the device was connected). HTML Version 3 introduced the META tag, which allows authors to specify indexing information. We expect this trend to continue, establishing standardized tags for Web document content. Meanwhile, as we have seen, the query representation. The first, modification of term weights, involves adjusting the query term weights by adding document vectors in the positive feedback set to the query vector. Optionally, negative feedback can be used to subtract document vectors in the negative feedback set from the query vector. The reformulated query should retrieve additional relevant documents similar to the documents in the positive feedback set. This process can be carried out iteratively until the user is satisfied with the quality and number of relevant documents in the query output. IX. TECHNIQUES FOR IMPROVING IR EFFECTIVENESS Interaction with user (relevance feedback) - Keywords only cover part of the contents - User can help by indicating relevant/irrelevant document The use of relevance feedback To improve query expression: Q new = *Q old + *Rel_d - *Nrel_d where Rel_d = centroid of relevant documents NRel_d = centroid of non-relevant documents Data retrieval, in the context of an IR system, consists mainly of determining which documents of a collection contain the keywords in the user query which, most frequently, is not enough to satisfy the user information need. In fact, the user of an IR system is concerned more with retrieving information about a subject than with retrieving data which satisfies a given query. A data retrieval language aims at retrieving all objects which satisfy clearly defined conditions such as those in a regular expression or in a relational algebra expression. Thus, for a data retrieval system, a single erroneous object among a thousand retrieved objects means total failure. For an information retrieval system, however, the retrieved objects might be inaccurate and small errors are likely to go unnoticed. The main reason for this difference is that information retrieval usually deals with natural language text which is not always well structured and could be semantically ambiguous. On the other hand, a data retrieval system (such as a relational database) deals with data that has a well defined structure and semantics. One may want to criticise this dichotomy on the grounds that the boundary between the two is a vague one. 791 And so it is, but it is a useful one in that it illustrates the range of complexity associated with each mode of retrieval. Let us now take each item in the table in turn and look at it more closely. In data retrieval we are normally looking for an exact match, that is, we are checking to see whether an item is or is not present in the file. In information retrieval this may sometimes be of interest but more generally we want to find those items which partially match the request and then select from those a few of the best matching ones. The inference used in data retrieval is of the simple deductive kind, that is, arb and brc then arc. In information retrieval it is far more common to use inductive inference; relations are only specified with a degree of certainty or uncertainty and hence our confidence in the inference is variable. This distinction leads one to describe data retrieval as deterministic but information retrieval as probabilistic. Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing. X. INFORMATION VERSUS DATA RETRIEVAL Data retrieval, in the context of an IR system, consists mainly of determining which documents of a collection contain the keywords in the user query which, most frequently, is not enough to satisfy the user information need. In fact, the user of an IR system is concerned more with retrieving information about a subject than with retrieving data which satisfies a given query [3]. A data retrieval language aims at retrieving all objects which satisfy clearly defined conditions such as those in a regular expression or in a relational algebra expression. Thus, for a data retrieval system, a single erroneous object among a thousand retrieved objects means total failure. For an information retrieval system, however, the retrieved objects might be inaccurate and small errors are likely to go unnoticed. The main reason for this difference is that information retrieval usually deals with natural language text which is not always well structured and could be semantically ambiguous. On the other hand, a data retrieval system (such as a relational database) deals with data that has a well defined structure and semantics. One may want to criticise this dichotomy on the grounds that the boundary between the two is a vague one. And so it is, but it is a useful one in that it illustrates the range of complexity associated with each mode of retrieval. Let us now take each item in the table in turn and look at it more closely. In data retrieval we are normally looking for an exact match, that is, we are checking to see whether an item is or is not present in the file. In information retrieval this may sometimes be of interest but more generally we want to find those items which partially match the request and then select from those a few of the best matching ones.

6 The inference used in data retrieval is of the simple deductive kind, that is, arb and brc then arc. In information retrieval it is far more common to use inductive inference; relations are only specified with a degree of certainty or uncertainty and hence our confidence in the inference is variable. This distinction leads one to describe data retrieval as deterministic but information retrieval as probabilistic [12]. Frequently Bayes' Theorem is invoked to carry out inferences in IR, but in DR probabilities do not enter into the processing. S.No Table 1.1 DIFFERENCE BETWEEN DATA RETRIEVAL AND INFORMATION RETRIEVALs Characteristics Data Retrieval (DR) Information Retrieval (IR) 1 Matching Exact match Partial match, best match 2 Inference Deduction Induction 3 Model Deterministic Probabilistic 4 Classification Monothetic Polythetic 5 Query language Artificial Natural 6 Query Complete Incomplete specification 7 Items wanted Matching Relevant 8 Error response Sensitive Insensitive XI. OTHER TECHNIQUES AND APPLICATIONS Many other techniques have been developed over the years and have met with varying success. Cluster hypothesis states that documents that cluster together (are very similar to each other) will have a similar relevance profile for a given query. Document clustering techniques were (and still are) an active area of research. Even though the usefulness of document clustering for improved search effectiveness (or efficiency) has been very limited, document clustering has allowed several developments in IR, e.g., for browsing and search interfaces. Natural Language Processing (NLP) has also been proposed as a tool to enhance retrieval effectiveness [8], but has had very limited success. Even though document ranking is a critical application for IR [2], it is definitely not the only one. The field has developed techniques to attack many different problems like information filtering, topic detection and tracking (or TDT), speech retrieval, cross-language retrieval, question answering, and many more. XII. CONCLUSION The field of information retrieval has come a long way in the last sixty-eight years, and has enabled easier and faster information discovery. In the early years there were many doubts raised regarding the simple statistical techniques used in the field. 792 However, for the task of finding information, these statistical techniques have indeed proven to be the most effective ones so far. Techniques developed in the field have been used in many other areas and have yielded many new technologies which are used by people on an everyday basis, e.g., web search engines, junk- filters, news clipping services. Going forward, the field is attacking many critical problems that users face in today information-ridden world. With exponential growth in the amount of information available, information retrieval will play an increasingly important role in future. There exist some future aspects like significant quality improvements still a tedious and difficult task, need more research requires and close cooperation. Acknowledgment The authors are grateful to the anonymous refers whose suggestions have significantly improved the clarity and content of this article. I would like to express our sincere gratitude towards our Chairman Shri Dev Murti, Principal T. D. Bhist, and Miss Manvi Mishra (Head of Deptt.). Without these members this manuscript is not possible. Under their cooperation we are able to make this manuscript successful. We are also thankful to our Assistant Prof. Preeti Pandey She helps me throughout this research paper. Last but not the least we are heartily thankful to my institution which bestowed this opportunity to me. REFERENCES [1 ] LANCASTER, F.W., Information Retrieval Systems: Characteristics and Evaluation, Wiley, New York (1968). [2 ] BAR-HILLEL, Y., Language and Information. Selected Essays on their Theory and Application, Addison-Wesley, Reading, Massachusetts (1964). [3 ] BARBER, A.S., BARRACLOUGH, E.D. and GRAY, W.A. 'Online information retrieval, Information Storage and Retrieval, 9, (1973). [4 ] CLEVERDON, C.W.,Evaluation of information retrieval systems', Journal of Documentation, 26, 55-67, (1970). [5 ] Stopwords Algorithm: webconfs.com/stopwords.php. [6 ] Dawson s Algorithm: cdp@2comp.lancs.ac.uk or [7 ] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, [8 ] T. Strzalkowski, L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J.Wang, and J. Wilding. Natural language information retrieval: TREC-5 report. In Proceedings of the Fifth Text Retrieval Conference (TREC-5), [9 ] VENKAT N. GUDIVADA Dow Jones Markets, VIJAY V. RAGHAVAN University of Southwestern Louisiana, Information retrieval on the world wide web. Ieee internet computing,1991. [10 ] WILLIAM I. GROSKY Wayne State University, RAJESH KASANAGOTTU University of Missouri, Effective search and retrieval are enabling technologies for realizing the full potential of the web. [11 ] For figure, images, diagrams, tables: [12 ] (van Rijsbergen, C.J. (1979)

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Review: Information Retrieval Techniques and Applications

Review: Information Retrieval Techniques and Applications International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373 377 Available online at: www.ijcncs.org E-ISSN 2308-9830 (Online) / ISSN 2410-0595 (Print) Review:

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

Modern Information Retrieval: A Brief Overview

Modern Information Retrieval: A Brief Overview Modern Information Retrieval: A Brief Overview Amit Singhal Google, Inc. singhal@google.com Abstract For thousands of years people have realized the importance of archiving and finding information. With

More information

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^ 1-1 I. The SMART Project - Status Report and Plans G. Salton 1. Introduction The SMART document retrieval system has been operating on a 709^ computer since the end of 1964. The system takes documents

More information

Fourth generation techniques (4GT)

Fourth generation techniques (4GT) Fourth generation techniques (4GT) The term fourth generation techniques (4GT) encompasses a broad array of software tools that have one thing in common. Each enables the software engineer to specify some

More information

Information Retrieval System Assigning Context to Documents by Relevance Feedback

Information Retrieval System Assigning Context to Documents by Relevance Feedback Information Retrieval System Assigning Context to Documents by Relevance Feedback Narina Thakur Department of CSE Bharati Vidyapeeth College Of Engineering New Delhi, India Deepti Mehrotra ASCS Amity University,

More information

A statistical interpretation of term specificity and its application in retrieval

A statistical interpretation of term specificity and its application in retrieval Reprinted from Journal of Documentation Volume 60 Number 5 2004 pp. 493-502 Copyright MCB University Press ISSN 0022-0418 and previously from Journal of Documentation Volume 28 Number 1 1972 pp. 11-21

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Healthcare, transportation,

Healthcare, transportation, Smart IT Argus456 Dreamstime.com From Data to Decisions: A Value Chain for Big Data H. Gilbert Miller and Peter Mork, Noblis Healthcare, transportation, finance, energy and resource conservation, environmental

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Information Need Assessment in Information Retrieval

Information Need Assessment in Information Retrieval Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

INFORMATION RETRIEVAL

INFORMATION RETRIEVAL INFORMATION RETRIEVAL C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S. Department of Computing Science University of Glasgow PREFACE TO THE SECOND EDITION The major change in the second edition of this book

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT

ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT ISSN 1392 124X INFORMATION TECHNOLOGY AND CONTROL, 2005, Vol.34, No.4 ISSUES ON FORMING METADATA OF EDITORIAL SYSTEM S DOCUMENT MANAGEMENT Marijus Bernotas, Remigijus Laurutis, Asta Slotkienė Information

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Introduction. Chapter 1. 1.1 Introduction. 1.2 Background

Introduction. Chapter 1. 1.1 Introduction. 1.2 Background Chapter 1 Introduction 1.1 Introduction This report is the fifth in the series describing experiments with the British Library Research & Development Department's (BLR&DD) Okapi bibliographic information

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION

I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University

More information

Electronic Document Management Using Inverted Files System

Electronic Document Management Using Inverted Files System EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,

More information

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources

Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Using Feedback Tags and Sentiment Analysis to Generate Sharable Learning Resources Investigating Automated Sentiment Analysis of Feedback Tags in a Programming Course Stephen Cummins, Liz Burd, Andrew

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

Case studies: Outline. Requirement Engineering. Case Study: Automated Banking System. UML and Case Studies ITNP090 - Object Oriented Software Design

Case studies: Outline. Requirement Engineering. Case Study: Automated Banking System. UML and Case Studies ITNP090 - Object Oriented Software Design I. Automated Banking System Case studies: Outline Requirements Engineering: OO and incremental software development 1. case study: withdraw money a. use cases b. identifying class/object (class diagram)

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Design Patterns for Complex Event Processing

Design Patterns for Complex Event Processing Design Patterns for Complex Event Processing Adrian Paschke BioTec Center, Technical University Dresden, 01307 Dresden, Germany adrian.paschke AT biotec.tu-dresden.de ABSTRACT Currently engineering efficient

More information

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired

More information

SEO AND CONTENT MANAGEMENT SYSTEM

SEO AND CONTENT MANAGEMENT SYSTEM International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Search Engine Optimization (SEO): Improving Website Ranking

Search Engine Optimization (SEO): Improving Website Ranking Search Engine Optimization (SEO): Improving Website Ranking Chandrani Nath #1, Dr. Laxmi Ahuja *2 # 1 *2 Amity University, Noida Abstract: - As web popularity increases day by day, millions of people use

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

Comparing Methods to Identify Defect Reports in a Change Management Database

Comparing Methods to Identify Defect Reports in a Change Management Database Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com

More information

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation*

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation* Valerie Barr Department of Computer

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

CLUSTERING FOR FORENSIC ANALYSIS

CLUSTERING FOR FORENSIC ANALYSIS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS

More information

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

IAI : Expert Systems

IAI : Expert Systems IAI : Expert Systems John A. Bullinaria, 2005 1. What is an Expert System? 2. The Architecture of Expert Systems 3. Knowledge Acquisition 4. Representing the Knowledge 5. The Inference Engine 6. The Rete-Algorithm

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases

More information

Concepts of digital forensics

Concepts of digital forensics Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT) The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit

More information

Interpreting areading Scaled Scores for Instruction

Interpreting areading Scaled Scores for Instruction Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should

More information

Expert Finding Using Social Networking

Expert Finding Using Social Networking San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 1-1-2009 Expert Finding Using Social Networking Parin Shah San Jose State University Follow this and

More information

Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5).

Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5). Information Retrieval Design. James D. Anderson and Jose Perez-Carballo. East Brunswick, NJ: Ometeca Institute, Inc., 2005. 617 pp. $35.00. (0-9763547-0-5). Information Retrieval Design is a textbook that

More information

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases A tool for adding semantic value to improve information filtering (Post Workshop revised version, November 1997) Konstantinos

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

Spam Filtering with Naive Bayesian Classification

Spam Filtering with Naive Bayesian Classification Spam Filtering with Naive Bayesian Classification Khuong An Nguyen Queens College University of Cambridge L101: Machine Learning for Language Processing MPhil in Advanced Computer Science 09-April-2011

More information

types of information systems computer-based information systems

types of information systems computer-based information systems topics: what is information systems? what is information? knowledge representation information retrieval cis20.2 design and implementation of software applications II spring 2008 session # II.1 information

More information

Efficient Recruitment and User-System Performance

Efficient Recruitment and User-System Performance Monitoring User-System Performance in Interactive Retrieval Tasks Liudmila Boldareva Arjen P. de Vries Djoerd Hiemstra University of Twente CWI Dept. of Computer Science INS1 PO Box 217 PO Box 94079 7500

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

Modeling Concept and Context to Improve Performance in ediscovery

Modeling Concept and Context to Improve Performance in ediscovery By: H. S. Hyman, ABD, University of South Florida Warren Fridy III, MS, Fridy Enterprises Abstract One condition of ediscovery making it unique from other, more routine forms of IR is that all documents

More information

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects

Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Enterprise Resource Planning Analysis of Business Intelligence & Emergence of Mining Objects Abstract: Build a model to investigate system and discovering relations that connect variables in a database

More information

Appendix A: Science Practices for AP Physics 1 and 2

Appendix A: Science Practices for AP Physics 1 and 2 Appendix A: Science Practices for AP Physics 1 and 2 Science Practice 1: The student can use representations and models to communicate scientific phenomena and solve scientific problems. The real world

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Web Data Extraction: 1 o Semestre 2007/2008

Web Data Extraction: 1 o Semestre 2007/2008 Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008

More information

A QoS-Aware Web Service Selection Based on Clustering

A QoS-Aware Web Service Selection Based on Clustering International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,

More information

SEO MADE SIMPLE. 5th Edition. Insider Secrets For Driving More Traffic To Your Website Instantly DOWNLOAD THE FULL VERSION HERE

SEO MADE SIMPLE. 5th Edition. Insider Secrets For Driving More Traffic To Your Website Instantly DOWNLOAD THE FULL VERSION HERE SEO MADE SIMPLE 5th Edition Insider Secrets For Driving More Traffic To Your Website Instantly DOWNLOAD THE FULL VERSION HERE by Michael H. Fleischner SEO Made Simple (Fifth Edition) Search Engine Optimization

More information

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING

CFSD 21 ST CENTURY SKILL RUBRIC CRITICAL & CREATIVE THINKING Critical and creative thinking (higher order thinking) refer to a set of cognitive skills or strategies that increases the probability of a desired outcome. In an information- rich society, the quality

More information

Building well-balanced CDN 1

Building well-balanced CDN 1 Proceedings of the Federated Conference on Computer Science and Information Systems pp. 679 683 ISBN 978-83-60810-51-4 Building well-balanced CDN 1 Piotr Stapp, Piotr Zgadzaj Warsaw University of Technology

More information

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM Volume 2, No. 5, May 2011 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM Sheilini

More information

Data Discovery on the Information Highway

Data Discovery on the Information Highway Data Discovery on the Information Highway Susan Gauch Introduction Information overload on the Web Many possible search engines Need intelligent help to select best information sources customize results

More information

Requirements Analysis through Viewpoints Oriented Requirements Model (VORD)

Requirements Analysis through Viewpoints Oriented Requirements Model (VORD) Requirements Analysis through Viewpoints Oriented Requirements Model (VORD) Ahmed M. Salem Computer Science Department California State University, Sacramento Sacramento, CA 95819 USA Email: salema@ecs.csus.edu

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

CAD/ CAM Prof. P. V. Madhusudhan Rao Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture No. # 03 What is CAD/ CAM

CAD/ CAM Prof. P. V. Madhusudhan Rao Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture No. # 03 What is CAD/ CAM CAD/ CAM Prof. P. V. Madhusudhan Rao Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture No. # 03 What is CAD/ CAM Now this lecture is in a way we can say an introduction

More information

Cover Page. "Assessing the Agreement of Cognitive Space with Information Space" A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program

Cover Page. Assessing the Agreement of Cognitive Space with Information Space A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program Cover Page "Assessing the Agreement of Cognitive Space with Information Space" A Research Seed Grant Proposal to the UNC-CH Cognitive Science Program Submitted by: Dr. Gregory B. Newby Assistant Professor

More information

CONCEPTCLASSIFIER FOR SHAREPOINT

CONCEPTCLASSIFIER FOR SHAREPOINT CONCEPTCLASSIFIER FOR SHAREPOINT PRODUCT OVERVIEW The only SharePoint 2007 and 2010 solution that delivers automatic conceptual metadata generation, auto-classification and powerful taxonomy tools running

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

(Refer Slide Time: 2:03)

(Refer Slide Time: 2:03) Control Engineering Prof. Madan Gopal Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 11 Models of Industrial Control Devices and Systems (Contd.) Last time we were

More information

Data Engineering for the Analysis of Semiconductor Manufacturing Data

Data Engineering for the Analysis of Semiconductor Manufacturing Data Data Engineering for the Analysis of Semiconductor Manufacturing Data Peter Turney Knowledge Systems Laboratory Institute for Information Technology National Research Council Canada Ottawa, Ontario, Canada

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

Discovering suffixes: A Case Study for Marathi Language

Discovering suffixes: A Case Study for Marathi Language Discovering suffixes: A Case Study for Marathi Language Mudassar M. Majgaonker Comviva Technologies Limited Gurgaon, India Abstract Suffix stripping is a pre-processing step required in a number of natural

More information

Relevance Feedback In An Automatic Document Retrieval System

Relevance Feedback In An Automatic Document Retrieval System Relevance Feedback In An Automatic Document Retrieval System A Thesis Presented to the Faculty of the Graduate School ox Cornell University for the Degree of Master of Science in Computer Science by Sleanor

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Chapter 2 The Information Retrieval Process

Chapter 2 The Information Retrieval Process Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense

More information