1 **** 1 Databases and computerized information retrieval Introduction **** What is a database? 2 A database is a collection of similar data records stored in a common file (or collection of files).
2 **** Types of databases: examples 3 Examples: The databases that form the basis for»catalogues of books or other types of documents»computerized bibliographies»address directories»a full text newspaper, newsletter, magazine, journal + collections of these»www and Internet search engines»intranet search engines»... ***- Information retrieval and related activities: figure 4 Information management Information retrieval Text retrieval Image retrieval Presentation of information
3 Information retrieval and related activities: explanation 5 Text retrieval can be considered as a part of the larger concept information management. There is a great overlap: text retrieval - image retrieval because image retrieval is in most cases based on text retrieval: in most cases retrieval of images is not based on computerized investigation of the images themselves, but on searches in the text that accompanies each image. ***- ***- Information retrieval: the terminology 6 Several words are used with similar or related meanings:»database / databank / corpus / collection / catalog / site / archive / file / web /...»contents of a database / records / documents / items / (web) pages /...»search / query / filter /...»thesaurus / controlled vocabulary / dictionary / lexicon / term bank / ontology /...»results / selection / retrieved documents / retrieved items /...
4 Information retrieval software: a particular type of DBMS 7 Software for information storage and retrieval (ISR software) Text(-oriented) database management systems (Text-DBMS) Text information management systems (TIMS) Document retrieval systems Document management systems ***- ***- Information retrieval: via a database to the user 8 Information content Linear file Database Inverted file Search engine Search interface User
5 **-- Information retrieval: building a database 9 Records fed into the database management system Indexing Records derived from the input and stored in the database Retrieval Inverted file, index, register of the database User **-- 10?? Question?? The records input in in a database system to to be be indexed do do not necessarily appear completely in in the output phase; that is: is: they are not shown completely to to the user of of the system in in the results of of a query. Can you illustrate this?
6 **** Information retrieval: the basic processes in search systems 11 Information problem Text documents Representation Representation Evaluation and feedback Query Indexed documents Comparison Retrieved, sorted documents ***- Information retrieval systems: many components make up a system 12 Any retrieval system is built up of many more or less independent components. To increase the quality of the results, these components can be modified more or less independent of each other.
7 Information retrieval systems: important components 13 the information content system to describe formal aspects of information items system to describe the subjects of information items concrete descriptions of information items = application of the used information description systems information storage and retrieval computer program(s) computer system used for retrieval type of medium or information carrier used for distribution ***- ***- Information retrieval systems: the information content 14 The information content is the information that is created or gathered by the producer. The information content is independent of software and of distribution media. The information content is input into the retrieval system using»a system (rules) to describe the formal aspects»a system (rules) to describe the contents (classification, thesaurus,...)
8 Information retrieval systems: media used for distribution 15 Hard copy (for information retrieval systems only in the broad sense)»print»microfiche For computers: (for information retrieval systems strictu sensu)»magnetic tape»floppy disk; optical disk (CD-ROM, Photo-CD, DVD...)»Online ***- ***- Information retrieval systems: the computer program 16 The information retrieval program consists of several modules, including: The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s). The search engine provides the search features and power that allow the inverted file(s) to be searched. The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).
9 What determines the results of a search in a retrieval system? the information retrieval system ( = contents + system) Result of of a search 2. the user of the retrieval system and the search strategy applied to the system ***- ***- Layered structure of a database 18 Database (File) Records Fields Characters + in many systems: relations / links between records
10 ***- A simple database architecture: all records together form a database 19 The salami architecture = sliced bread architecture»the salami or the bread is a database»each slice of salami or bread is a database record»there are no relations between slices / records»the retrieval system tries to offer the appropriate slices / records to the user **-- 20!! Question!! The database architecture described here is is simple, but which factors make retrieval nevertheless a complex procedure in in many real databases with this architecture?
11 **-- Characteristics / definition of structured text-information 21 The text information is structured. (files, records, fields, sub-fields, links/relations among records...) The length of records and fields can be long. Some fields are multi-valued = they occur more than once = repeated or repeatable fields **-- Structure of a bibliographic file 22 Record No. 1 Title Author 1: name + first name Author 2:... Source Descriptor 1 Descriptor 2... Subfields Repeated fields Record No. 2
12 **** 23 Databases and computerized information retrieval Text retrieval and language ***- an overview 24 Problems/difficulties related to language / terminology occur in the case of multi-linguality :! cross-language information retrieval ; that is when more than 1 language is used»in the contents of the searched database(s) and/or in the subject descriptors of the searched database(s) OR»in the search terms used in a query even when only 1 language is applied throughout the system
13 ***- enhancing retrieval Retrieval can be enhanced by coping with the problems caused by the use of natural language. Contributions to this enhancement of retrieval can be made by»the database producer»the computerized retrieval system»the searcher/user (The distinction between these is not very sharp and clear in all cases.) 25 **-- 26!! Task - Assignment!! Read about Language and information retrieval by by Large, Andrew, Tedd, Lucy A., A., and Hartley, R.J. Chapter 4 in: in: Information seeking in in the the online age: principles and practice. London :: Bowker-Saur, 1999, 308 pp. pp.
14 **-- 27!! Task - Assignment!! Read about Information organization. By By Large, Andrew, Tedd, Lucy A., A., and Hartley, R.J. Chapter 5 in: in: Information seeking in in the the online age: principles and practice. London :: Bowker-Saur, 1999, 308 pp. **** a word is not a concept (a) 28 Problem: A word or phrase or term is not the same as a concept or subject or topic. Word Word Concept!
15 **** a word is not a concept (a ) 29 So, to cover a concept in a search, to increase the recall of a search, the user of a retrieval system should consider an expansion of the query; that is: the user should also include other words in the query to cover the concept.! **** a word is not a concept (a ) 30»synonyms! (such as : Latin names of species in biology besides the common names, scientific names besides common names of substances in chemistry )!
16 **** a word is not a concept (a ) 31»narrower terms, more specific terms (such as particular brand names); including terms with prefixes (for instance: viruses, retroviruses, rotaviruses...)»spelling variations (such as UK English versus US English); possible variations after transliteration! **** a word is not a concept (a ) 32»singular or plural forms of a noun (when this is used as a search term)»(relevant) related terms»various forms of a verb (when this is used in the query)»broader terms (perhaps)!
17 33 a word is not a concept (b) Method to solve the problem at the time of database production:»adding to each database record those codes from a classification system or terms from a thesaurus system that are relevant, and providing the user with knowledge about the system used; in some cases, this process is computerized (with intellectual intervention or completely automatic) ***- ***- 34 a word is not a concept (b )»However, this solution is not perfect: Addition of terms by humans from a controlled vocabulary / from a thesaurus is not easy and time consuming. Consequences: the added value lags behind the availability of the document the process can delay access to the document the process is expensive Moreover, in practice, most users of the resulting database do not exploit this method offered.
18 ***- a word is not a concept (c) Method to solve the problem, provided by the computerized retrieval system:»offering to the user a partly computerized access to the particular subject description system used by the database producer, and then linking to the database for searching»computerized, automatic, analysis of the free text search terms applied in a query by the user, for transparent mapping to the corresponding particular classification codes, categories, or thesaurus terms used by the database producer 35 **-- a word is not a concept (c )»offering the searching user access to a (general) thesaurus system, even when the database producer has not categorised the database contents; in this way, the user can refine his/her query»better, and more generally: computerized, automatic expansion of the query terms introduced by the user, based on a general thesaurus! (however, not many retrieval systems offer this feature) 36
19 **-- a word is not a concept (c )»to avoid the problems of possible variations at the end of search terms: offering the possibility to the user to truncate a search term explicitly computerized, automatic, transparent truncation without explicit user action 37 **-- a word is not a concept (c )»to avoid the problems of possible prefixes and suffixes: computerized, automatic, transparent, intelligent morphological analysis of the query terms: stemming of the free text search terms used by the user; however, this does not work perfectly and has not (yet) been implemented in most retrieval systems; for languages that have a richer morphology than English, this can offer even a larger pay-off 38
20 **** 39?? Question?? Which problems in in text retrieval are illustrated by by the following sentences?! ****Examples 40 Time flies like an arrow. Fruit flies like a banana.?
21 ****Examples 41 Time flies like an arrow. Fruit flies like a banana. ****Examples 42 Time flies like an arrow. Fruit flies like a banana. OK!
22 **** ambiguity of meaning (a) 43 Problem: A word or phrase can have more than 1 meaning, because natural languages have evolved spontaneously, not strictly controlled. Ambiguity of the meaning = polysemy. The meaning can depend on the context. The meaning may depend on the region where the term is used. This is a problem for retrieval. This decreases the precision of many searches. ****Example ambiguity of meaning (a ) 44 An example is the word pascal, which can have several meanings:»the philosopher Blaise Pascal,»the programming language Pascal,»the physical unit of pressure, and»the name of many persons Another example:»turkey, the country!»turkey, the animal
23 ****Example ambiguity of meaning (a ) 45 Example of sentences:»the banks of New Zealand flooded our mailboxes with free account proposals.»the banks of New Zealand flooded with heavy rains account for the economic loss.! **** ambiguity of meaning (a ) 46 Problem: Ambiguity of meaning may be the cause of low precision. Word! NOT Relevant concept Irrelevant concept wanted
24 ambiguity of meaning (b) Method to solve the problem at the time of database production:»adding to each database record codes from a classification system or terms from a thesaurus system, and providing the user with knowledge about the system used; in some cases, this process is computerized (completely automatic or with intellectual intervention); 47 ***- ***- ambiguity of meaning (b ) Method to solve the problem, provided by the computerized retrieval system:»offering to the user a partly computerized access to the subject description system and then linking to the database for searching 48
25 ***- ambiguity of meaning (b )»searching normally (without added value), but adding value by categorizing the retrieved items in the presentation phase to assist in the disambiguation ; this feature is offered for instance by the public access module of the book catalogue of the library automation system VUBIS at VUB, Belgium, when a searching items that were assigned a particular keyword 49 *--- 50!! Task - Assignment!! Search Clusty or or Vivisimo or orwisenut as as an an example of of a system that applies automatic, computerized subject categorization of of database records.
26 ambiguity of meaning (b )»Natural language processing of the queries: linguistic analysis to determine possible meanings of the query, which includes disambiguation of words in their context: lexical analysis = at the level of the word semantic analysis = at the level of the sentence However, most queries are short and therefore it is difficult to apply semantic analysis for disambiguation. 51 ***- ***- ambiguity of meaning (b )»Natural language processing of the documents: linguistic analysis to determine possible meanings of a sentence, which includes disambiguation of words in their context: lexical analysis = at the level of the word semantic analysis = at the level of the sentence However, most retrieval systems do not apply this complicated method. 52
27 **** A word is not a concept A concept is not a word 53 Word1 Concept1 Word2 Concept2 Word3 Concept3 The most simple relation between words and concepts is NOT valid. **** A word is not a concept A concept is not a word 54 Word1 Relevant concept 1 Word2 Irrelevant concept 2 Word3 Irrelevant concept3 A concept cannot be covered by only 1 word or term; this may be the cause of low recall of a search. The meaning of many words is ambiguous; this may be the cause of low precision of a search.
28 **-- relation with recall and precision 55 Recapitulating the two problems discussed, we can say that Expansion of the query allows to increase the recall. Disambiguation of the query allows to increase the precision.! **-- evolution of meaning (a) 56 Difficulty: The meaning of a word or phrase can change over time.!
29 **-- evolution of meaning (b) Method to solve the problem at the time of database production:»using a categorization system and also adapting this continuously to the changing reality and meanings of terms 57 ***- phrases composed of words (a) 58 Problem: Most retrieval systems can search for words, but they do not directly recognize or know phrases / terms composed of more than 1 word.!
30 phrases composed of words (b) Methods to solve the problem, provided by the computerized retrieval system:»the user can and should indicate explicitly that a few words should be considered together by the retrieval system as forming a phrase/term (for instance in many Internet search engines by putting the phrase in quotes like three word phrase ) 59 ***- ***- phrases composed of words (b )»better: the retrieval system automatically recognizes a phrase/term relying on a term bank that has been created in advance; examples: the Internet search engines AltaVista and Scirus work in this way 60
31 **-- searching more than 1 database (a) 61 Problem: Searching various databases at the same time, or merging databases for searching, suffers from the problem that these databases may use categorization systems to make the problem of terminology and language smaller, but in most cases these systems are different and incompatible.! **-- searching more than 1 database (b) 62 Method to solve the problem, provided by the computerized retrieval system:»mapping of the search term chosen by the user to the various thesaurus terms used by the various databases; only a few retrieval systems try to accomplish this
32 **-- relations among concepts (a) 63 Difficulty: In many cases, when the user combines several concepts in 1 search, the searching user cannot well communicate the intended relations among these concepts to the retrieval system.! **--Examples relations among concepts (a ) 64»Example: concept 1 = children/sons/daughters/... concept 2 = parents/fathers/mothers/... concept 3 = beating/violence/... How to find documents on children beating their parents while avoiding documents on parents beating their children?!
33 **--Examples relations among concepts (a ) 65»Example: concept 1 = computers concept 2 = architecture How to find documents on (the application/role/importance of) computers in architecture, while avoiding documents on the architecture of computers?! **-- relations among concepts (b) Method to solve the problem, provided by the database producer:»offering facilities to the user for disambiguation, like in the more simple case of singular terms without combinations with other terms 66
34 **-- relations among concepts (b ) Method to solve the problem, provided by the computerized retrieval system:»natural language analysis of both the documents and the natural language query to interpret their structure and meaning 67 **-- expressing the purpose of a search 68 Difficulty: Classical queries and retrieval systems work with terms to match the subject, the aboutness expressed in the query with the documents, but do not try to express and to understand the purpose, aim and context of the search.!
35 ***- 69?? Question?? Which are some of of the problems caused by by the use of of language in in information retrieval?! **-- Text retrieval and multi-linguality (1a) 70 Problem: When the user does not know well the language of a (monolingual) database, searching is not efficient.!
36 **-- Text retrieval and multi-linguality (1b) Methods to solve the problem, at the time of database production:»adding subject descriptors in various languages (for instance in Pascal and Francis made by INIST)»adding abstracts in various languages (for instance the abstracts in English in INSPEC)»translation of the complete contents of the database These processes can be partly computerized, but they are still time consuming and expensive. 71 **-- Text retrieval and multi-linguality (1c) Method to solve the problem, provided by the computerized retrieval system:»translating the query of the user, by using a general multilingual thesaurus; however, most free text queries are quite short, which makes it difficult to use the context to limit possible ambiguity; disambiguation by user-computer interaction offered by the query interface, can increase the effectiveness here. 72
37 **-- Text retrieval and multi-linguality (2a) 73 Problem: When documents in a database are written in more than 1 language, searching that database in a single language may not be sufficient to retrieve all interesting, relevant documents.! **-- Text retrieval and multi-linguality (2b) Method to solve the problem:»extensions of the methods when only 1 language is used in the documents 74
38 **-- Text retrieval and multi-linguality (3) 75 Problem: When more than 1 database is searched at the same time, the mechanisms to solve problems related to language in each separate database cannot be applied so well anymore.! **-- Text retrieval and multi-linguality (4a) 76 Problem: Of course, the user should ideally be able to understand the contents of all the retrieved documents, even when various languages are used in those documents.!
39 **-- Text retrieval and multi-linguality (4b) Methods to solve the problem, at the time of database production:»adding abstracts in various languages (for instance the abstracts in English in INSPEC)»translation of the complete contents of the database These processes can be partly computerized, but they are still time consuming and expensive. 77 **-- Text retrieval and multi-linguality (4c) Methods to solve the problem, provided by the computerized retrieval system:»rapid automated translation of the titles of retrieved records/documents (for instance offered by the Internet search engine AltaVista) of the abstracts of retrieved records/documents (for instance offered by the Internet search engine AltaVista) of the complete retrieved records/documents 78
40 **-- A good text retrieval system solves some problems due to language accepts words / terms / phrases in the query of the user maps the words to corresponding concepts presents these concepts to the user who can then select the appropriate, relevant concept ( disambiguation ) searches for this concept, even in documents written in another language presents the resulting, retrieved documents in the language preferred by the user 79 **-- Enhanced text retrieval using natural language processing 80 Information problem Text documents Representation Representation Query Indexed documents Evaluation and feedback Natural language processing of the documents AND of the query Comparison and matching of both Retrieved, sorted documents
41 ***- conclusions 81 The use of terms and language to retrieve information from databases/collections/corpora causes many problems. These problems are not recognized or underestimated by many users of search/retrieval systems = The power of retrieval systems is overestimated by many users. Much research and development is still needed to enhance text retrieval. **-- 82!! Task - Assignment!! Recommended reading: Veal, D.C. Progress in in documentation: Techniques of of document management: a review of of text retrieval and related technologies. J. J. Doc., Vol. 57, No. 2, 2, March 2001, pp
42 **-- 83!! Task - Assignment!! Recommended reading: Chowdhury, G. G. G., G., and Chowdhury, Sudatta Information retrieval in in digital libraries. In: In: Introduction to to digital libraries. London :: Facet Publishing, 2003, 354 pp. pp. **-- 84?? Question?? Explain the basic relations/similarities in in speech recognition (speech to to text) translation of of a text (text to to text) summarizing texts (text to to summary) text retrieval (query to to texts) cross-language text retrieval (combination)
43 **** 85 Databases and computerized information retrieval Hints on how to use information sources **** Hints on how to use information sources: overview (Part 1) 86 Know the purpose and motivation for each search. Do not be lazy: search on your own, before bothering experts with requests for advice. Plan your search in advance. Choose the best source(s) for each search. Use the available tools for subject searching well. Try to cope with the language problems; avoid spelling errors in your search query; use spelling variations in your search query
44 **** Hints on how to use information sources: overview (Part 2) 87 Match your search strategy with the type of source. Work cost-effectively. Use special care when searching for names. Be specific. Avoid broad searches. Limit your search to a specific country or region if required. Work iteratively. Keep a record of your work. **** Hints on how to use information sources: overview (Part 3) 88 Do not only focus on a single source. Consider citation indexes besides subject-oriented databases, as useful secondary information sources. Stop searching when enough is enough Give up if necessary... (Not all questions have an answer.) Be critical: not all information is correct or useful.
45 **** Hints on how to use information sources: overview (Part 4) 89 In computer-based retrieval systems, consider applying»truncation of search terms (using a symbol like * or?)»combine search terms, using Boolean operators: OR AND / + NOT / AND NOT / - proximity operators (for instance NEAR ) phrase searching ( word1 word2 )»searching limited to a field (for instance URL, title ) **** Hints on how to use information sources: subject searching 90 When you search for information on a particular topic/subject: investigate if the database producer offers»a subject classification scheme and/or»a controlled/approved/accepted subject terms, and/or»a subject thesaurus Exploit these, if they are available. In most cases you should find and use synonyms and narrower terms Use broader and /or related terms, if appropriate.
46 **-- Hints on how to use information sources: language problems The problem of search terms with more than one meaning: solutions»select the most specific, appropriate database.»limit to a specific, appropriate section of the database.»find first synonyms or narrower terms using a vocabulary or thesaurus, and use these as search terms.»limit the search to one (or several) fields.»... **** Hints on how to use information sources: Boolean combinations 92 Most text search systems understand the basic Boolean operators: OR = obtain records that contain one or both search terms AND = obtain records that contain both search terms NOT or ANDNOT or AND NOT = exclude records that contain a search term
47 **** Hints on how to use information sources: Boolean combinations 93 In the case of computer-based information sources, use Boolean combinations of search terms when appropriate and when possible. term x1 OR term x2 OR term x3 AND term y1 OR term y2 OR term y3 AND term z1 OR term z2 OR term z3 AND... ***-?? Question?? 94 Suppose that you want to to search for a topic that has several synonyms (for example, young people, adolescents, teenagers, teens). Then which one of of the following operators would you use in in your query? ADJ AND NEAR NOT OR
48 **** Hints on how to use information sources: Boolean queries 95 Most text search systems understand the basic Boolean operators typed in capital characters: OR AND So this leads us to queries like for instance (word1 OR word2 OR word3 OR word4) AND (worda OR wordb OR wordc) **** Hints on how to use information sources: default Boolean operator 96 Find out if there is a default implicit Boolean operator working in the search system that you use. This works even when no operator is used explicitly among words. This can be OR, AND, NEAR... So this leads us to queries like for instance (word1 OR word2 OR word3 OR word4) (worda OR wordb OR wordc)
49 ?? Question?? 97 Why is is it it important to to know the default Boolean operator in in the search system that you use? You can also explain this with an an example. ***- ***-!! Task - Assignment!! 98 You can read Cohen, Laura Boolean searching on on the Internet. [online] Available from: University Libraries, University at at Albany, USA. [cited 2006]
50 ?? Question?? 99 You You want to to search a database for for a low-fat recipe for for pasta with with either shrimp or or chicken. Which query demonstrates the the proper use use of of nesting to to get get many search results that that are are very very relevant? noodles or or (pasta and and shrimp) or or chicken and and low-fat (noodles or or pasta) and and (shrimp or or chicken) and and low-fat noodles or or pasta and and (shrimp or or chicken) and and low-fat (noodles or or pasta) and and shrimp or or (chicken and and low-fat) noodles or or pasta and and shrimp or or chicken and and low-fat ***- ***-?? Question?? 100 You You need need information information on on the the communication communication strategies strategies applied applied by by the the popular popular star star Madonna. Madonna. Which Which query query will will probably probably be be the the most most efficient efficient one one in in some some particular particular database, database, (of (of course course in in the the case case that that the the database database understands understands the the operators operators applied) applied) Communication Communication AND AND strategies strategies Madonna Madonna AND AND communication communication AND AND strategies strategies Madonna Madonna OR OR communication communication OR OR strategies strategies Strategies Strategies OR OR communication communication Madonna Madonna
51 ****?? Question?? 101 How many (and which) concepts/facets do do you see in in a search for general reviews about monitoring seawater pollution that is is due to to effluents in in Tanzania? ****!! Task - Assignment!! 102 Prepare off-line, on on paper, a suitable search query in in a generic format, to to find general reviews about monitoring seawater pollution that is is due to to effluents as as the basis for later, concrete searches in in databases. (Limit yourself to to 1 of of the concepts.)
52 ***-Example Hints on how to use information sources: example of a search query 103 Example: Searching for the concept sea can or should involve for instance the following words in a Boolean OR-combination: baltic OR bay OR bays OR coast OR coastal OR coastline OR coasts OR cove OR coves OR gulf OR mangrove OR mangroves OR marine OR mediterranean OR noordzee OR noordzeekust OR noordzeekusten OR ocean OR oceanic OR oceans OR pacific OR reef OR reefs OR saline-freshwater interface OR sea OR seas OR seashore OR seawater OR seawaters OR shore OR shores ****?? Question?? 104 What did you learn from the exercise on on the formulation of of a query?
53 **--!! Task - Assignment!! 105 Prepare off-line, on on paper, a suitable search query in in a generic format, to to find documents about how to to evaluate the ability to to find scientific information of of starting university students up up to to professional scientists as as the basis for later, concrete searches in in databases. (Limit yourself to to 1 of of the concepts.) ***-?? Question?? 106 How can we we exploit in in some searches the fact that many bibliographic databases (in (in particular the commercial, expensive ones) offer records with a field structure?
54 **--!! Task - Assignment!! 107 Read Luther, Luther, Judy, Judy, Kelly, Kelly, Maureen, and and Beagle, Beagle, Donald Donald Visualize this this (Visualization software may may become become a powerful new new way way to to search search or or a footnote in in technology history). Library Library Journal, Journal, March March 1, 1, 2005, 2005, pp. pp **** Hints on how to use information sources: work iteratively 108 Work iteratively = search, investigate your results, refine your search, search again, and so on; do not try to find everything in 1 step, with 1 search. Query Searching Feedback Results
55 **** Hints on how to use information sources: work iteratively: example 109 When you search a database with subject keywords from a controlled list, added to each record: 1. Search with search terms that you know 2. Investigate the results and select good, relevant items 3. Look for the keywords added to these items 4. Select the good, relevant keywords 5. Formulate a new search with these keywords added 6. Execute the new search 7. Repeat the procedure **--!! Task - Assignment!! 110 Search Search in in the the freely freely accessible ERIC ERICdatabase for for documents on on courses offered offered through the the web web in in the the field field of of architecture, or or history, history, or or computer applications. This This is is not not easy, easy, because because words words like like web, web, architecture, history, history, and and computers, can can have have other other meanings than than titles titles of of courses. Therefore, find find and and use use the the controlled subject subject terms terms that that are are added added by by the the database producer and and see see that that the the results results are are better. better.
56 **** 111 The ability to ask the right question is more than half the battle of finding the answer. Thomas J. Watson? **** Hints on how to use information sources: when to stop searching? 112 Develop a feel for the curve of diminishing returns : If you spend too much time, effort, and/or money with too few benefits, you should stop. payoff Time to stop? time / effort / money
57 **** 113 You are free to copy, distribute, display this work under the following conditions:»attribution: You must mention the author.»noncommercial: You may not use this work for commercial purposes.»no Derivative Works: You may not change, modify, alter, transform, or build upon this work. For any reuse or distribution, you must make clear to others the license terms of this work.
Some fundamental aspects of information storage and retrieval 1 Paul.Nieuwenhuysen@vub.ac.be Vrije Universiteit Brussel Pleinlaan 2, B-1050 Brussels, Belgium Prepared for a tutorial at IIVO in Brussels,
How to Work with a Reference Answer Set Easily identify and isolate references of interest Quickly retrieve relevant information from the world s largest, publicly available reference database for chemistry
OvidSP Quick Reference Guide Opening an OvidSP Session Open the OvidSP URL with a browser or Follow a link on a web page or Use Athens or Shibboleth access Select Resources to Search In the Select Resource(s)
FINDING INFORMATION Paul Nieuwenhuysen Vrije Universiteit Brussel, Brussel, Belgium Keywords: information searching, information retrieval, Internet, World-Wide Web, libraries, information centers Contents
J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria 105 A Software Tool for Thesauri Management, Browsing and Supporting Advanced Searches J. Nogueras-Iso, J.A. Bañares, J. Lacasta, J. Zarazaga-Soria
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
EBSCOhost Research Databases http://web.ebscohost.com EBSCOhost is a powerful online reference system accessible via the Internet. It offers a variety of proprietary full text databases and popular databases
Information retrieval systems in scientific and technological libraries: from monolith to puzzle and beyond Paul Nieuwenhuysen Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel Paul.Nieuwenhuysen@vub.ac.be
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
**** 1 Computer software Overview **** Computer software: summary/overview/abstract (Part 1) 2 The following presents an overview of various aspects and types of computer software: Computer (disk) operating
DATABASE SEARCHING TIPS: PART ONE By Marvin Hunn The Searching WorldCat and Searching ATLA exercises were designed to make you think about how to search a structured, controlled-vocabulary metadata-based
Taxonomy Strategies November 28, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. The Value of Taxonomy Management Research Results Joseph A Busch, Principal What does taxonomy do for search?
Web of Science Quick Reference Card ISI WEB OF KNOWLEDGE SM Search over 0,000 journals from over 5 different languages across the sciences, social sciences, and arts and humanities to find the high quality
Introduction to Manual Annotation This document introduces the concept of annotations, their uses and the common types of manual annotation projects. This is a supplement to project-specific guidelines
LexisNexis TotalPatent Training Manual March, 2013 Table of Contents 1 GETTING STARTED Signing On / Off Setting Preferences and Project IDs Online Help and Feedback 2 SEARCHING FUNDAMENTALS Overview of
www.engineeringvillage.com QUICK REFERENCE GUIDE Quick Search The first page you see in Engineering Village is the Quick Search page. This page is designed for quick, straightforward searches. The interface
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
boolean retrieval some slides courtesy James Allan@umass 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of
Biological Abstracts Quick Reference Card ISI WEB OF KNOWLEDGE SM Biological Abstracts offers researchers, educators, students, and information professionals comprehensive coverage of life sciences research
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
Tamil Search Engine Baskaran Sankaran AU-KBC Research Centre, MIT campus of Anna University, Chromepet, Chennai - 600 044. India. E-mail: firstname.lastname@example.org Abstract The Internet marks the era of Information
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department Student Learning Outcomes Learning Outcome 1 LO1 literate student determines the nature and extent of the
PHOTOGRAPH CONSERVATION THESAURUS Catarina Mateus Photograph Conservator Abstract As a dissertation project for the MA in Preventive Conservation at Northumbria University, U.K., Catarina Mateus is developing
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Your guide to finding academic information for Engineering This guide is designed to introduce you to electronic academic databases relevant to Engineering. Online databases contain large amounts of current
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
1 P a g e Introduction The (ACRL) Information Literacy Competency Standards for Higher Education (2000) provides general performance indicators and outcomes to use in undergraduate settings. The following
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
HOW TO USE ONLINE PUBLIC ACCESS CATALOG (OPAC) INTRODUCTION TO OPAC OPAC stands for Online Public Access Catalog. It is also known as the catalog, PAC, WebPAC, library catalog, and online catalog. OPACs
Library Services www.uwe.ac.uk/library Tips for Effective Searching Library Services for students and staff in the Faculty of Health and Social Care CONTENTS Introduction 3 1. Basic principles 3 2 Defining
PsycINFO Guide On the Ithaca College Library web site, PsycINFO is available through Ebsco. The top of the first screen will look like this: The Ebsco platform allows you to perform complicated searches
1 What is a literature review? A literature review is more than a list of bibliographic references. A good literature review surveys and critiques the body of literature in your field of interest. It enables
California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department Student Outcomes Outcome 1 LO1 determines the nature and extent of the information needed. Outcome 2 LO2
Chapter : The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour This chapter will provide an overview of The Cochrane Library Search: Learn how The Cochrane Library new search feature
4. Finding books In this chapter Library catalogs Catalog as database Database searching Searching a library catalog Hawaii Voyager Locating a book in the library After preparing yourself for the research
1 Online searching: Emerald Electronic Database This is an introduction to the concept of Online Searching as specifically applied to the Emerald database. This tutorial will guide you through the initial
School Library Standards for California Public Schools, Grades Nine through Twelve STANDARD 1 Students Access Information The student will access information by applying knowledge of the organization of
HOW TO USE THE DATABASES EMERALD Read How to devise a search strategy before using this guide. Emerald contains articles from 113 peer-reviewed journals on management topics, including a number specifically
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
MODULE 1 Internet Concepts Instructions This part of the course is a PowerPoint demonstration intended to introduce you to Internet concepts. This part of the module is off-line and intended as an information
Includes Enhanced Search Tools! Reference Guide Go to www.thecochranelibrary.com to discover this essential resource today 1 What is in The Cochrane Library? The Cochrane Library consists of seven databases
What is Web of Science Core Collection? Search over 55 million records from the top journals, conference proceedings, and books in the sciences, social sciences, and arts and humanities to find the high
Chapter 5 Use the technological Tools for accessing Information 5.1 Overview of Databases are organized collections of information. Ex: Online Catalogs. 1. Structure of a Database - Contains record and
Module Three - Electronic Information Searching Techniques Introduction Within the electronic environment, information is accessed through the Internet, online databases for journal articles and books.
Successful Web Search Strategies Kathleen Schrock Itinerary Definition of search engine How a search engine works Definition of subject directory Improving your use of search engines Effective search strategies
Wellington Medical and Health Sciences Library http://www.otago.ac.nz/wellington/library Advanced Searching in OvidSP Table of contents Preparing to search p.1 Advanced search mode in OvidSP p.2 Preparing
Taxonomies for Auto-Tagging Unstructured Content Heather Hedden Hedden Information Management Text Analytics World, Boston, MA October 1, 2013 About Heather Hedden Independent taxonomy consultant, Hedden
Doing a literature search: a step by step guide Faculty Librarians Table of Contents What is a literature search?... 3 Why carry out a literature search?... 3 The Stages of the Literature Search... 4 1.
Strand: Reading Nonfiction Topic (INCCR): Main Idea 5.RN.2.2 In addition to, in-depth inferences and applications that go beyond 3.5 In addition to score performance, in-depth inferences and applications
Dagobert Soergel College of Library and Information Services University of Maryland College Park, Maryland 20742 301-454-5453 How does one design a database? Paper presented at the New York Library Association
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Quick Reference: Searching, Availability and Requesting an Item Introduction Based on WorldCat holdings, WorldCat Local enables your users to search your local or shared catalog, scope results, and use
Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language Processing Some subproblems are partially solved Spelling correction, grammar checking Information retrieval with
Weblogs Content Classification Tools: performance evaluation Jesús Tramullas a,, and Piedad Garrido a a Universidad de Zaragoza, Dept. of Library and Information Science. Pedro Cerbuna 12, 50009 Zaragoza.email@example.com
Sabinet Reference Guide http://reference.sabinet.co.za All Sabinet s Reference databases are searchable on the Sabinet Reference platform. The databases have been grouped into the following broad categories:
Topics for Today! Text transformation Word occurrence statistics Tokenizing Stopping and stemming Phrases Document structure Link analysis Information extraction Internationalization Phrases! Many queries
The Library Instruction Program: A Plan for Information Literacy at Oglethorpe University Introduction Information literacy, the ability to access, evaluate, and incorporate information effectively, is
Finding and citing articles for students at College of Architecture & Design Goals of this tutorial: Searching and locating articles for architecture and design-related research using Avery Index. Citing
Education Full Text Tutorial Database Description Education Full Text provides coverage of international English-language periodicals, monographs, and yearbooks in virtually all aspects of education. Indexing
ISSN 1346-5597 NII Technical Report Web Content Summarization Using Social Bookmarking Service Jaehui Park, Tomohiro Fukuhara, Ikki Ohmukai, and Hideaki Takeda NII-2008-006E Apr. 2008 Jaehui Park School
Minnesota K-12 Academic Standards in Language Arts Curriculum and Assessment Alignment Form Rewards Intermediate Grades 4-6 4 I. READING AND LITERATURE A. Word Recognition, Analysis, and Fluency The student
DEVELOPMENT OF NATURAL LANGUAGE INTERFACE TO RELATIONAL DATABASES C. Nancy * and Sha Sha Ali # Student of M.Tech, Bharath College Of Engineering And Technology For Women, Andhra Pradesh, India # Department
[ SADLIER Common Core Progress English Language Arts Aligned to the [ Florida Next Generation GRADE 6 Sunshine State (Common Core) Standards for English Language Arts Contents 2 Strand: Reading Standards
Information access through information technology 1 Created to support an invited lecture at the International Conference MDGICT 2009 in Tamil Nadu, India, December 2009 by Paul.Nieuwenhuysen@vub.ac.be
Software Engineering Application Architectures Based on Software Engineering, 7 th Edition by Ian Sommerville Objectives To explain the organization of two fundamental models of business systems - batch
NewsEdge.com User Guide November 2013 Table of Contents Accessing NewsEdge.com... 5 NewsEdge.com: Front Page... 6 Saved Search View... 7 Free Text Search Box... 7 Company Watchlist... 9 Weather...12 NewsEdge.com:
Review Easy Guide for Administrators Version 1.0 Notice to Users Verve software as a service is a software application that has been developed, copyrighted, and licensed by Kroll Ontrack Inc. Use of the
From: AAAI Technical Report WS-99-01. Compilation copyright 1999, AAAI (www.aaai.org). All rights reserved. IntelliServeTM: Automating Customer Service Yannick Lallement I & Mark S. Fox 1 2 1Novator Systems
In: O. Hryniewicz, J. Studziński and M. Romaniuk (eds.): Environmental informatics and systems research / EnviroInfo 2007. The 21th International Conference on "Informatics for Environmental Protection",
Towards a Transparent Proactive User Interface for a Shopping Assistant Michael Schneider Department of Computer Science, Saarland University, Stuhlsatzenhausweg, Bau 36.1, 66123 Saarbrücken, Germany firstname.lastname@example.org
Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Electronic Performance Support Systems (EPSS): An Effective System for Improving the Performance of Libraries Madhuresh Singhal & T S Prasanna National Centre for Science Information Indian Institute of
Audio Engineering Resources Audio Engineering resources Books, DVDs, CDs & print journals Search for these in the JMC Academy Library Catalogue click on Library Catalogue link on Student Portal Look in
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
Quick Reference: Melvyl / WorldCat Discovery This guide focuses on the campus-specific Melvyl versions. Searching & Displaying The Melvyl search box (or a link to the search box) is located on your campus
DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality
A GENERALIZED APPROACH TO CONTENT CREATION USING KNOWLEDGE BASE SYSTEMS By K S Chudamani and H C Nagarathna JRD Tata Memorial Library IISc, Bangalore-12 ABSTRACT: Library and information Institutions and
WIS & Engineering Geert-Jan Houben Contents Web Information System (WIS) Evolution in Web data WIS Engineering Languages for Web data XML (context only!) RDF XML Querying: XQuery (context only!) RDFS SPARQL