Binary and Ranked Retrieval

Similar documents
Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation of Retrieval Systems

w ki w kj k=1 w2 ki k=1 w2 kj F. Aiolli - Sistemi Informativi 2007/2008

Mining a Corpus of Job Ads

Supervised Learning Evaluation (via Sentiment Analysis)!

1 o Semestre 2007/2008

Lecture 5: Evaluation

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

An Information Retrieval using weighted Index Terms in Natural Language document collections

Experiments in Web Page Classification for Semantic Web

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Search and Information Retrieval

Projektgruppe. Categorization of text documents via classification

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

Performance Measures in Data Mining

Mining the Software Change Repository of a Legacy Telephony System

Performance Metrics. number of mistakes total number of observations. err = p.1/1

Performance Measures for Machine Learning

Bayesian Spam Detection

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Business Case Development for Credit and Debit Card Fraud Re- Scoring Models

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Supervised Learning (Big Data Analytics)

Chapter 6. The stacking ensemble approach

Efficient Recruitment and User-System Performance

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Term extraction for user profiling: evaluation by the user

Identifying Focus, Techniques and Domain of Scientific Papers

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Data Mining Algorithms Part 1. Dejan Sarka

Word Completion and Prediction in Hebrew

Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

OBSERVATIONS FROM 2010 INSPECTIONS OF DOMESTIC ANNUALLY INSPECTED FIRMS REGARDING DEFICIENCIES IN AUDITS OF INTERNAL CONTROL OVER FINANCIAL REPORTING

Review & AI Lessons learned while using Artificial Intelligence April 2013

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

More Data Mining with Weka

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers TREC 8 Interactive Track Experience

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Text Mining for Software Engineering: How Analyst Feedback Impacts Final Results

8 Evaluating Search Engines

Domain Classification of Technical Terms Using the Web

E-discovery Taking Predictive Coding Out of the Black Box

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

types of information systems computer-based information systems

Three Methods for ediscovery Document Prioritization:

A CLASSIFIER FUSION-BASED APPROACH TO IMPROVE BIOLOGICAL THREAT DETECTION. Palaiseau cedex, France; 2 FFI, P.O. Box 25, N-2027 Kjeller, Norway.

Spam Detection System Combining Cellular Automata and Naive Bayes Classifier

T : Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari :

3/17/2009. Knowledge Management BIKM eclassifier Integrated BIKM Tools

Document Image Retrieval using Signatures as Queries

STANDARD. Risk Assessment. Supply Chain Risk Management: A Compilation of Best Practices

The Alignment of Common Core and ACT s College and Career Readiness System. June 2010

F. Aiolli - Sistemi Informativi 2007/2008

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

Monte Carlo analysis used for Contingency estimating.

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

II. DISTRIBUTIONS distribution normal distribution. standard scores

Private Record Linkage with Bloom Filters

PROPOSED DOCUMENT. Quality management system Medical devices Nonconformity Grading System for Regulatory Purposes and Information Ex-change

Building a Question Classifier for a TREC-Style Question Answering System

I. The SMART Project - Status Report and Plans. G. Salton. The SMART document retrieval system has been operating on a 709^

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

MANAGEMENT OPTIONS AND VALUE PER SHARE

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001

Completing a Peer Review

An experience with Semantic Web technologies in the news domain

Universities of Leeds, Sheffield and York

Consulting Performance, Rewards & Talent. Measuring the Business Impact of Employee Selection Systems

Disseminating Research and Writing Research Proposals

Data Mining Practical Machine Learning Tools and Techniques

A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering

Supporting Online Material for

David Robins Louisiana State University, School of Library and Information Science. Abstract

Automatic Document Categorization A Hummingbird White Paper

Multistep Dynamic Expert Sourcing

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

The Tested Effectiveness of Equivio>Relevance in Technology Assisted Review

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Comparing Methods to Identify Defect Reports in a Change Management Database

Quality and Complexity Measures for Data Linkage and Deduplication

Identifying Content for Planned Events Across Social Media Sites

TF-IDF. David Kauchak cs160 Fall 2009 adapted from:

Text Analytics Illustrated with a Simple Data Set

Bayesian Spam Filtering

CENG 734 Advanced Topics in Bioinformatics

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

ACTUARIAL CONSIDERATIONS IN THE DEVELOPMENT OF AGENT CONTINGENT COMPENSATION PROGRAMS

Discovering process models from empirical data

Towards a Visually Enhanced Medical Search Engine

Machine Learning Final Project Spam Filtering

Quick Start to Data Analysis with SAS Table of Contents. Chapter 1 Introduction 1. Chapter 2 SAS Programming Concepts 7

L25: Ensemble learning

ASPECT BASED SENTIMENT ANALYSIS

Transcription:

Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return under-dimensioned output over-dimensioned output Ranked Retrieval (ordering induced by RSV(d i,q j ) R) Allows the user to start from the top of the ranked list and explore down until she sees fit. This caters for the need of different types of users, those that want just few highly relevant documents, and those that want many more. 17 How a modern IR system operates (off-line) building document representations and loading them into an internal index structure; post-processing the retrieved documents (on-line) reading a query and returning the user documents that the system considers relevant to the information need expressed by this query Binary-retrieval or Ranked-retrieval Peraphs reading user s feedback on the documents retrieved and using it for performing an improved retrieval pass. 18

Representing textual documents and information needs String searching is inappropriate for IR (best suited for data retrieval) Computationally hard The occurrence of string x in a document d in neither necessary nor sufficient condition for relevance of d to an information need about x It is thus usual to produce an internal representation (IREP) of the meaning of documents (off-line) and of queries (on-line), and to determine their match at retrieval time 19 Indexing Indexing: The process by which IREPs of documents and queries are produced It typically generates a set of (possibly weighted) index terms (or features) as the IREPs of a document (or query) Underlying assumption: the meaning of this set well approximates the meaning of the document (or query) 20

Indexing in textual IR Index terms in textual IR can be: Words (e.g. classification) automatically extracted Stems of words (e.g. class-) automatically extracted N-grams automatically extracted Noun-phrases (e.g. classification if industrial processes) automatically extracted Words (or noun phrases) from a controlled vocabulary (e.g. categorization) 21 Incidence Matrix The result of the indexing process: the incidence matrix. t 1 t k t n d 1 w 11 w k1 w n1 d i w 1i w ki w ni d m w 1m w km w nm N.B. W ki can either be binary or real. T = {t 1,..,t n } is the dictionary of the document base 22

Indexing - Weights Weights can be assigned 1. Manually (typically binary weights are used) by trained human indexers or intermediaries who are familiar with The discipline the documents deal with The indexing technique (e.g. the optimum number of terms for an IREP, the controlled vocabulary,..) The contents of the collection (e.g. topic distribution) 2. Automatically (either binary or real weights are used): by indexing processes based on a statistical analysis of word occurrence in the documents, in the query and in the collection. Approach 2. is nowadays the only one left in text retrieval (cheaper and more effective). Approach 1. and 2. has been used in conjunction until recently because of the difficulty in producing effective automatic indexing techniques for non-textual media. 23 Indexing - considerations The use of the same indexing technique for documents and query alike tends to guarantee a correct matching process There is indeterminacy in the indexing process: different indexers (human or automatic) do not produce in general the same IREP for the same document! Unlike what happened in reference retrieval systems, the on-line availability of the entire document allows the use of the entire document also for indexing 24

Evaluation and Experimentation The evaluation of an IR technique or system is in general accomplished experimentally (instead of analytically) 1. Effectiveness How well it separates/ranks docs w.r.t. the degree of relevance for the user 2. Efficiency How long it takes for indexing, for searching,.. How expressive is the query language. How large is the doc base 3. Utility Quality w.r.t. costs (of design, development, update, use, etc.) paid by involved parties (developers, users, etc.) 4. Coverage (e.g. for Web search engines) The subset of the available docs that the search engine covers (i.e. indexes, provides access to) Criteria 1. and 4. are widely used, 3. is hardly quantifiable. 25 Effectiveness for Binary Retrieval: Precision and Recall If relevance is assumed to be binary-valued, effectiveness is typically measured as a combination of Precision: the degree of soundness of the system π=pr(rel Ret)= Rel ˆ Ret ˆ Ret ˆ Recall: the degree of completeness of the system ρ=pr(ret Rel)= Rel ˆ Ret ˆ Rel ˆ 26

ˆ Rel ˆ Ret Collection ˆ Rel Relevant Docs Retrieved Docs ˆ Ret 27 Contingency Table Retrieved Not Retrieved Relevant True positives (tp) False negatives (fn) Not Relevant False positives (fp) True negatives (tn) π= tp tp+fp ρ= tp tp+fn Why NOT using the accuracy α= tp+tn tp+fp+tn+fn? 28

F measure Precision-oriented users Web surfers Recall-oriented users Professional searchers, paralegals, intelligence analysts A measure that trades-off precision versus recall? F-measure (weighted harmonic mean of the precision and recall) F = (β2 +1)πρ β 2 π+ρ F β=1 = 2πρ π+ρ β < 1 emphasizes precision! 29 Evaluation for Ranked retrieval Precision at Recall In a ranked retrieval system, precision and recall are values relative to a rank position r. These systems can be evaluated by computing precision as a function of recall, i.e. π(ρ) What is the precision π(r(ρ)) at the first rank position r(ρ) for which recall has a value of ρ? We compute this function at each rank position in which a relevant document has been retrieved, and the resulting values are interpolated yielding a precision/recall plot A unique numerical value of the effectiveness can be obtained by computing e.g. the integral of precision as a function of recall 30

Collection D, D = 100, a query q with Rel =20 Rank Pos. q Rel? ρ π(ρ) 1 Y 1/20=0.05 1/1=1.00 2 Y 2/20=0.10 2/2=1.00 3 N 4 Y 3/20=0.15 3/4=0.75 5 N 6 N 7 Y 4/20=0.20 4/7=0.57 31 Note that The effectiveness of a system is typically evaluated by averaging over different queries (macroaveraging) Different searchers are equally important Partial view of a problem: different methods may work best for different types of queries A typical precision/recall plot Is monotonically decreasing For ρ=1 it takes the value π=g, where g= Rel /D is the generality (frequency) of the query When the document base is big, it is very important to have high precisions for small recall values. Measures such as precision at 10 (P@10) are often used in place of π(ρ) Typical values are not beyond.4 precision at.4 recall 32

( 33 Precision and Recall Drawbacks 1. The proper estimation of maximum recall requires detailed knowledge of the entire (possibly very large) collection. 2. Precision and recall capture different aspects of the set of retrieved documents. A single measure would be more appropriate. 3. Do not fit the interactive retrieval process settings 4. Inadequate for systems which require weak orderings 34

Benchmark Collections A set of (up to O(10 7 )) documents D = {d 1,,d m } A set of queries (topics in TREC terminology) Q = {q 1,,q l } A (typically binary) relevance matrix of size m l, who can be either the original query issuers (TREC), or (more often) domain experts. A test collection is an abstraction of an operational retrieval environment. It allows to test the relative benefits of different strategies in a controlled way 35 Limits of the scientific approach Two limits Relevance judgments tend to be topicality judgments It does not consider other aspects such as the user interface and its fitness to the user-seeking behavior After all, their construction is the real problem 36

TREC collection TREC web site http://trec.nist.gov The corpus: Ad hoc track in the first 8 TREC competitions between 92 and 99. Several millions documents and 450 information needs In TREC, as in many other big collections relevant documents are identified by a data-pooling method thus approximating the set of relevant documents from below 37 38

Other Benchmark Collections (used in Text Categorization) Reuters-21578 The most widely used in text categorization. It consists of newswire articles which are labeled with some number of topical classifications (zero or more). 9603 train + 3299 test documents Reuters RCV1 Newstories, larger than the previous (about 810K documents) 20 Newsgroups 18491 articles from the 20 Usenet newsgroups 39