boolean retrieval some slides courtesy James



Similar documents
TF-IDF. David Kauchak cs160 Fall 2009 adapted from:

Information Retrieval and Web Search Engines

Chapter 1: The Cochrane Library Search Tour

Lecture 1: Introduction and the Boolean Model

ATLAS.ti 5.5: A Qualitative Data Analysis Tool

Search and Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Mining Text Data: An Introduction

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

What is Artificial Intelligence?

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Descriptive Statistics and Measurement Scales

Introduction to formal semantics -

Electronic Document Management Using Inverted Files System

KS3 Computing Group 1 Programme of Study hours per week

1 Review of Least Squares Solutions to Overdetermined Systems

Review. Bayesianism and Reliability. Today s Class

Search engine ranking

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

PATHWAYS TO READING INSTRUCTION INCREASING MAP SCORES HEIL, STEPHANIE. Submitted to. The Educational Leadership Faculty

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

6-1. Process Modeling


Netmail Search for Outlook 2010

Three Methods for ediscovery Document Prioritization:

SWIFT: A Text-mining Workbench for Systematic Review

CALCULATIONS & STATISTICS

Advanced Cryptography

SQL Tables, Keys, Views, Indexes

Introduction to Information Retrieval

Basic Concept of Time Value of Money

Service Oriented Architecture

Log Analysis of Academic Digital Library: User Query Patterns

A Short Guide to Significant Figures

IAI : Expert Systems

Depth-of-Knowledge Levels for Four Content Areas Norman L. Webb March 28, Reading (based on Wixson, 1999)

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Database Applications Microsoft Access

NPV Versus IRR. W.L. Silber We know that if the cost of capital is 18 percent we reject the project because the NPV

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

A s h o r t g u i d e t o s ta n d A r d i s e d t e s t s

Collaborative Filtering. Radek Pelánek

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Chapter 2 The Information Retrieval Process

Cosmological Arguments for the Existence of God S. Clarke

Constructing a TpB Questionnaire: Conceptual and Methodological Considerations

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Choice of Discount Rate

Working with SmartArt

User research for information architecture projects

Fundamentalism and the UK NEW INTERNATIONALIST EASIER ENGLISH PRE-INTERMEDIATE READY LESSON

Objective. Materials. TI-73 Calculator

Review: Information Retrieval Techniques and Applications

Binary Adders: Half Adders and Full Adders

CSC384 Intro to Artificial Intelligence

1 o Semestre 2007/2008

Lesson Plan. Parallel Resistive Circuits Part 1 Electronics

Playing with Numbers

The Impact of Query Suggestion in E-Commerce Websites

Technical challenges in web advertising

1 Boolean retrieval. Online edition (c)2009 Cambridge UP

Lecture - 4 Diode Rectifier Circuits

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Attacking information overload in software development

Quality Risk Management The Pharmaceutical Experience Ann O Mahony Quality Assurance Specialist Pfizer Biotech Grange Castle

How does the problem of relativity relate to Thomas Kuhn s concept of paradigm?

Aim To help students prepare for the Academic Reading component of the IELTS exam.

Finite Automata. Reading: Chapter 2

The Purchase Price in M&A Deals

Investment Decision Analysis

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

Five High Order Thinking Skills

A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION

Improving EHR Semantic Interoperability Future Vision and Challenges

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

Reflections on Probability vs Nonprobability Sampling

National curriculum tests. Key stage 1. English reading test framework. National curriculum tests from For test developers

Managing Variability in Software Architectures 1 Felix Bachmann*

Non-Parametric Tests (I)

1 Solving LPs: The Simplex Algorithm of George Dantzig

How Does Memory Change With Age? Class Objectives. Think about the importance of your memory 3/22/2011. The retention of information over time

Linear Algebra Methods for Data Mining

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Transcription:

boolean retrieval some slides courtesy James Allan@umass 1

what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of the process, draw conclusions, make predictions Conclusions derived from a model depend on whether the model is a good approximation of the actual situation Statistical models represent repetitive processes, make predictions about frequencies of interesting events Retrieval models can describe the computational process e.g. how documents are ranked Note that how documents or indexes are stored is implementation Retrieval models can attempt to describe the human process e.g. the information need, interaction Few do so meaningfully Retrieval models have an explicit or implicit definition of relevance 2

retrieval models today -boolean -vector space -latent semantic indexing -statistical language -inference network 3

exact vs. best match Exact-match query specifies precise retrieval criteria every document either matches or fails to match query result is a set of documents Unordered in pure exact match Best-match Query describes good or best matching document Every document matches query to some degree Result is ranked list of documents Popular approaches often provide some of each E.g., some type of ranking of result set (best of both worlds) E.g., best-match query language that incorporates exactmatch operators 4

exact match retrieval Advantages of exact match Can be very efficiently implemented Predictable, easy to explain Structured queries for pinpointing precise documents Work well when you know exactly (or roughly) what the collection contains and what you re looking for Disadvantages of exact match Query formulation difficult for most users Difficulty increases with collection size Indexing vocabulary same as query vocabulary Acceptable precision generally means unacceptable recall Ranking models consistently shown to be better Hard to compare best- and exact-match in principled way (why?) 5

best match retrieval Retrieving documents that satisfy a Boolean expression constitutes the Boolean exact match retrieval model Best-match or ranking models are now more common Advantages: Significantly more effective than exact match Uncertainty is a better model than certainty Easier to use (supports full text queries) Similar efficiency (based on inverted file implementations) Disadvantages: More difficult to convey an appropriate cognitive model ( control ) Full text does not mean natural language understanding (no magic ) Efficiency is always less than exact match (cannot reject documents early) Boolean or structured queries can be part of a best-match retrieval model 6

boolean retrieval Boolean model is most common exact-match model queries are logic expressions with document features as operands In pure Boolean model, retrieved documents are not ranked Most implementations provide some sort of ranking query formulation difficult for novice users Boolean queries Used by Boolean model and in other models (Boolean query Boolean model) Pure Boolean operators: AND, OR, AND-NOT Most systems have proximity operators Most systems support simple regular expressions as search terms to match spelling variants 7

boolean query languages Many users prefer Boolean Especially professional searchers Many WESTLAW, DIALOG searches still use Boolean Control Understandability For some queries or collections, Boolean often works better (e.g., using AND on the Web) Boolean and free text find different documents Need retrieval models that support both Extended Boolean vector space Probabilistic inference network Need interfaces that provide good cognitive models for ranking 8

example nuclear nonprolife ration treaty Iran D1 0 0 0 0 D2 1 0 0 1 D3 0 0 1 0 D4 0 0 1 1 D5 1 1 0 0 D6 0 0 1 1 D7 1 0 1 0 D8 0 1 1 1 query : (nuclear AND treaty) OR ((NOT treaty) AND (nonproliferation OR Iran)) retrieved docs : D7 D5 D2 9