boolean retrieval some slides courtesy James
|
|
|
- Olivia Blair
- 9 years ago
- Views:
Transcription
1 boolean retrieval some slides courtesy James 1
2 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of the process, draw conclusions, make predictions Conclusions derived from a model depend on whether the model is a good approximation of the actual situation Statistical models represent repetitive processes, make predictions about frequencies of interesting events Retrieval models can describe the computational process e.g. how documents are ranked Note that how documents or indexes are stored is implementation Retrieval models can attempt to describe the human process e.g. the information need, interaction Few do so meaningfully Retrieval models have an explicit or implicit definition of relevance 2
3 retrieval models today -boolean -vector space -latent semantic indexing -statistical language -inference network 3
4 exact vs. best match Exact-match query specifies precise retrieval criteria every document either matches or fails to match query result is a set of documents Unordered in pure exact match Best-match Query describes good or best matching document Every document matches query to some degree Result is ranked list of documents Popular approaches often provide some of each E.g., some type of ranking of result set (best of both worlds) E.g., best-match query language that incorporates exactmatch operators 4
5 exact match retrieval Advantages of exact match Can be very efficiently implemented Predictable, easy to explain Structured queries for pinpointing precise documents Work well when you know exactly (or roughly) what the collection contains and what you re looking for Disadvantages of exact match Query formulation difficult for most users Difficulty increases with collection size Indexing vocabulary same as query vocabulary Acceptable precision generally means unacceptable recall Ranking models consistently shown to be better Hard to compare best- and exact-match in principled way (why?) 5
6 best match retrieval Retrieving documents that satisfy a Boolean expression constitutes the Boolean exact match retrieval model Best-match or ranking models are now more common Advantages: Significantly more effective than exact match Uncertainty is a better model than certainty Easier to use (supports full text queries) Similar efficiency (based on inverted file implementations) Disadvantages: More difficult to convey an appropriate cognitive model ( control ) Full text does not mean natural language understanding (no magic ) Efficiency is always less than exact match (cannot reject documents early) Boolean or structured queries can be part of a best-match retrieval model 6
7 boolean retrieval Boolean model is most common exact-match model queries are logic expressions with document features as operands In pure Boolean model, retrieved documents are not ranked Most implementations provide some sort of ranking query formulation difficult for novice users Boolean queries Used by Boolean model and in other models (Boolean query Boolean model) Pure Boolean operators: AND, OR, AND-NOT Most systems have proximity operators Most systems support simple regular expressions as search terms to match spelling variants 7
8 boolean query languages Many users prefer Boolean Especially professional searchers Many WESTLAW, DIALOG searches still use Boolean Control Understandability For some queries or collections, Boolean often works better (e.g., using AND on the Web) Boolean and free text find different documents Need retrieval models that support both Extended Boolean vector space Probabilistic inference network Need interfaces that provide good cognitive models for ranking 8
9 example nuclear nonprolife ration treaty Iran D D D D D D D D query : (nuclear AND treaty) OR ((NOT treaty) AND (nonproliferation OR Iran)) retrieved docs : D7 D5 D2 9
TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
Chapter 1: The Cochrane Library Search Tour
Chapter : The Cochrane Library Search Tour Chapter : The Cochrane Library Search Tour This chapter will provide an overview of The Cochrane Library Search: Learn how The Cochrane Library new search feature
Lecture 1: Introduction and the Boolean Model
Lecture 1: Introduction and the Boolean Model Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group [email protected] 1 Overview
ATLAS.ti 5.5: A Qualitative Data Analysis Tool
Part III: Searching Your Data ATLAS.ti has a variety of tools to help you search your own data. Word Cruncher At the most basic level, the word cruncher can show you all of the frequencies of the words
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting
Inf1-DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA 2010 2011 III: 2 / 89 Part III Unstructured
What is Artificial Intelligence?
CSE 3401: Intro to Artificial Intelligence & Logic Programming Introduction Required Readings: Russell & Norvig Chapters 1 & 2. Lecture slides adapted from those of Fahiem Bacchus. 1 What is AI? What is
Incorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
Introduction to formal semantics -
Introduction to formal semantics - Introduction to formal semantics 1 / 25 structure Motivation - Philosophy paradox antinomy division in object und Meta language Semiotics syntax semantics Pragmatics
Electronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
KS3 Computing Group 1 Programme of Study 2015 2016 2 hours per week
1 07/09/15 2 14/09/15 3 21/09/15 4 28/09/15 Communication and Networks esafety Obtains content from the World Wide Web using a web browser. Understands the importance of communicating safely and respectfully
1 Review of Least Squares Solutions to Overdetermined Systems
cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares
Review. Bayesianism and Reliability. Today s Class
Review Bayesianism and Reliability Models and Simulations in Philosophy April 14th, 2014 Last Class: Difference between individual and social epistemology Why simulations are particularly useful for social
Search engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
PATHWAYS TO READING INSTRUCTION INCREASING MAP SCORES HEIL, STEPHANIE. Submitted to. The Educational Leadership Faculty
Pathways Instruction 1 Running Head: Pathways Instruction PATHWAYS TO READING INSTRUCTION INCREASING MAP SCORES By HEIL, STEPHANIE Submitted to The Educational Leadership Faculty Northwest Missouri State
Exam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
6-1. Process Modeling
6-1 Process Modeling Key Definitions Process model A formal way of representing how a business system operates Illustrates the activities that are performed and how data moves among them Data flow diagramming
American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
Netmail Search for Outlook 2010
Netmail Search for Outlook 2010 Quick Reference Guide Netmail Search is an easy-to-use web-based electronic discovery tool that allows you to easily search, sort, retrieve, view, and manage your archived
Three Methods for ediscovery Document Prioritization:
Three Methods for ediscovery Document Prioritization: Comparing and Contrasting Keyword Search with Concept Based and Support Vector Based "Technology Assisted Review-Predictive Coding" Platforms Tom Groom,
SWIFT: A Text-mining Workbench for Systematic Review
SWIFT: A Text-mining Workbench for Systematic Review Ruchir Shah, PhD Sciome LLC NTP Board of Scientific Counselors Meeting June 16, 2015 Large Literature Corpus: An Ever Increasing Challenge Systematic
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Advanced Cryptography
Family Name:... First Name:... Section:... Advanced Cryptography Final Exam July 18 th, 2006 Start at 9:15, End at 12:00 This document consists of 12 pages. Instructions Electronic devices are not allowed.
SQL Tables, Keys, Views, Indexes
CS145 Lecture Notes #8 SQL Tables, Keys, Views, Indexes Creating & Dropping Tables Basic syntax: CREATE TABLE ( DROP TABLE ;,,..., ); Types available: INT or INTEGER REAL or FLOAT CHAR( ), VARCHAR( ) DATE,
Introduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:
Basic Concept of Time Value of Money
Basic Concept of Time Value of Money CHAPTER 1 1.1 INTRODUCTION Money has time value. A rupee today is more valuable than a year hence. It is on this concept the time value of money is based. The recognition
Service Oriented Architecture
Service Oriented Architecture Charlie Abela Department of Artificial Intelligence [email protected] Last Lecture Web Ontology Language Problems? CSA 3210 Service Oriented Architecture 2 Lecture Outline
Log Analysis of Academic Digital Library: User Query Patterns
Log Analysis of Academic Digital Library: User Query Patterns Hyejung Han 1, Wooseob Jeong 1 and Dietmar Wolfram 1 1 University of Wisconsin - Milwaukee Abstract This study analyzed user queries submitted
A Short Guide to Significant Figures
A Short Guide to Significant Figures Quick Reference Section Here are the basic rules for significant figures - read the full text of this guide to gain a complete understanding of what these rules really
IAI : Expert Systems
IAI : Expert Systems John A. Bullinaria, 2005 1. What is an Expert System? 2. The Architecture of Expert Systems 3. Knowledge Acquisition 4. Representing the Knowledge 5. The Inference Engine 6. The Rete-Algorithm
Depth-of-Knowledge Levels for Four Content Areas Norman L. Webb March 28, 2002. Reading (based on Wixson, 1999)
Depth-of-Knowledge Levels for Four Content Areas Norman L. Webb March 28, 2002 Language Arts Levels of Depth of Knowledge Interpreting and assigning depth-of-knowledge levels to both objectives within
CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model
CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 The Relational Model CS2 Spring 2005 (LN6) 1 The relational model Proposed by Codd in 1970. It is the dominant data
Database Applications Microsoft Access
Database Applications Microsoft Access Lesson 4 Working with Queries Difference Between Queries and Filters Filters are temporary Filters are placed on data in a single table Queries are saved as individual
NPV Versus IRR. W.L. Silber -1000 0 0 +300 +600 +900. We know that if the cost of capital is 18 percent we reject the project because the NPV
NPV Versus IRR W.L. Silber I. Our favorite project A has the following cash flows: -1 + +6 +9 1 2 We know that if the cost of capital is 18 percent we reject the project because the net present value is
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
A s h o r t g u i d e t o s ta n d A r d i s e d t e s t s
A short guide to standardised tests Copyright 2013 GL Assessment Limited Published by GL Assessment Limited 389 Chiswick High Road, 9th Floor East, London W4 4AL www.gl-assessment.co.uk GL Assessment is
Collaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
Chapter 2 The Information Retrieval Process
Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense
Cosmological Arguments for the Existence of God S. Clarke
Cosmological Arguments for the Existence of God S. Clarke [Modified Fall 2009] 1. Large class of arguments. Sometimes they get very complex, as in Clarke s argument, but the basic idea is simple. Lets
Constructing a TpB Questionnaire: Conceptual and Methodological Considerations
Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory
Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Digital Logic II May, I before starting the today s lecture
Choice of Discount Rate
Choice of Discount Rate Discussion Plan Basic Theory and Practice A common practical approach: WACC = Weighted Average Cost of Capital Look ahead: CAPM = Capital Asset Pricing Model Massachusetts Institute
Working with SmartArt
CHAPTER Working with SmartArt In this chapter by Geetesh Bajaj Understanding SmartArt 206 Adding SmartArt to a Presentation 206 Formatting SmartArt 208 Sample SmartArt Variants 211 Common SmartArt Procedures
User research for information architecture projects
Donna Maurer Maadmob Interaction Design http://maadmob.com.au/ Unpublished article User research provides a vital input to information architecture projects. It helps us to understand what information
Fundamentalism and the UK NEW INTERNATIONALIST EASIER ENGLISH PRE-INTERMEDIATE READY LESSON
Fundamentalism and the UK NEW INTERNATIONALIST EASIER ENGLISH PRE-INTERMEDIATE READY LESSON Countries find yours and compare Today: Grammar: practise comparatives Vocabulary Speaking: to discuss Reading:
Objective. Materials. TI-73 Calculator
0. Objective To explore subtraction of integers using a number line. Activity 2 To develop strategies for subtracting integers. Materials TI-73 Calculator Integer Subtraction What s the Difference? Teacher
Review: Information Retrieval Techniques and Applications
International Journal of Computer Networks and Communications Security VOL. 3, NO. 9, SEPTEMBER 2015, 373 377 Available online at: www.ijcncs.org E-ISSN 2308-9830 (Online) / ISSN 2410-0595 (Print) Review:
Binary Adders: Half Adders and Full Adders
Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order
CSC384 Intro to Artificial Intelligence
CSC384 Intro to Artificial Intelligence What is Artificial Intelligence? What is Intelligence? Are these Intelligent? CSC384, University of Toronto 3 What is Intelligence? Webster says: The capacity to
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Lesson Plan. Parallel Resistive Circuits Part 1 Electronics
Parallel Resistive Circuits Part 1 Electronics Lesson Plan Performance Objective At the end of the lesson, students will demonstrate the ability to apply problem solving and analytical techniques to calculate
Playing with Numbers
PLAYING WITH NUMBERS 249 Playing with Numbers CHAPTER 16 16.1 Introduction You have studied various types of numbers such as natural numbers, whole numbers, integers and rational numbers. You have also
The Impact of Query Suggestion in E-Commerce Websites
The Impact of Query Suggestion in E-Commerce Websites Alice Lee 1 and Michael Chau 1 1 School of Business, The University of Hong Kong, Pokfulam Road, Hong Kong [email protected], [email protected] Abstract.
Technical challenges in web advertising
Technical challenges in web advertising Andrei Broder Yahoo! Research 1 Disclaimer This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. 2 Advertising
1 Boolean retrieval. Online edition (c)2009 Cambridge UP
DRAFT! April 1, 2009 Cambridge University Press. Feedback welcome. 1 1 Boolean retrieval INFORMATION RETRIEVAL The meaning of the term information retrieval can be very broad. Just getting a credit card
Lecture - 4 Diode Rectifier Circuits
Basic Electronics (Module 1 Semiconductor Diodes) Dr. Chitralekha Mahanta Department of Electronics and Communication Engineering Indian Institute of Technology, Guwahati Lecture - 4 Diode Rectifier Circuits
Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
Attacking information overload in software development
Attacking information overload in software development Gail Murphy University of British Columbia Tasktop Technologies This talk contains copyright pictures obtained under license. The license associated
Quality Risk Management The Pharmaceutical Experience Ann O Mahony Quality Assurance Specialist Pfizer Biotech Grange Castle
Quality Risk Management 11 November 2011 Galway, Ireland Quality Risk Management The Pharmaceutical Experience Ann O Mahony Quality Assurance Specialist Pfizer Biotech Grange Castle Overview Regulatory
How does the problem of relativity relate to Thomas Kuhn s concept of paradigm?
How does the problem of relativity relate to Thomas Kuhn s concept of paradigm? Eli Bjørhusdal After having published The Structure of Scientific Revolutions in 1962, Kuhn was much criticised for the use
Aim To help students prepare for the Academic Reading component of the IELTS exam.
IELTS Reading Test 1 Teacher s notes Written by Sam McCarter Aim To help students prepare for the Academic Reading component of the IELTS exam. Objectives To help students to: Practise doing an academic
Finite Automata. Reading: Chapter 2
Finite Automata Reading: Chapter 2 1 Finite Automaton (FA) Informally, a state diagram that comprehensively captures all possible states and transitions that a machine can take while responding to a stream
The Purchase Price in M&A Deals
The Purchase Price in M&A Deals Question that came in the other day In an M&A deal, does the buyer pay the Equity Value or the Enterprise Value to acquire the seller? What does it mean in press releases
Investment Decision Analysis
Lecture: IV 1 Investment Decision Analysis The investment decision process: Generate cash flow forecasts for the projects, Determine the appropriate opportunity cost of capital, Use the cash flows and
Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
Five High Order Thinking Skills
Five High Order Introduction The high technology like computers and calculators has profoundly changed the world of mathematics education. It is not only what aspects of mathematics are essential for learning,
A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION
Volume 4, No. 1, January 2013 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION 1 Er.Tanveer Singh, 2
Improving EHR Semantic Interoperability Future Vision and Challenges
Improving EHR Semantic Interoperability Future Vision and Challenges Catalina MARTÍNEZ-COSTA a,1 Dipak KALRA b, Stefan SCHULZ a a IMI,Medical University of Graz, Austria b CHIME, University College London,
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.
MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target
Reflections on Probability vs Nonprobability Sampling
Official Statistics in Honour of Daniel Thorburn, pp. 29 35 Reflections on Probability vs Nonprobability Sampling Jan Wretman 1 A few fundamental things are briefly discussed. First: What is called probability
National curriculum tests. Key stage 1. English reading test framework. National curriculum tests from 2016. For test developers
National curriculum tests Key stage 1 English reading test framework National curriculum tests from 2016 For test developers Crown copyright 2015 2016 key stage 1 English reading test framework: national
Managing Variability in Software Architectures 1 Felix Bachmann*
Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA [email protected] Len Bass Software Engineering Institute Carnegie
Non-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM [email protected] http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
How Does Memory Change With Age? Class Objectives. Think about the importance of your memory 3/22/2011. The retention of information over time
How Does Memory Change With Age? The retention of information over time Class Objectives What is memory? What factors influence our memory? Think about the importance of your memory It s hard to even attempt
Linear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, [email protected] Spring 2007 Text mining & Information Retrieval Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007
Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information
