Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group



Similar documents
An Information Retrieval using weighted Index Terms in Natural Language document collections

Mining Text Data: An Introduction

Search Engines. Stephen Shaw 18th of February, Netsoc

How to stop looking in the wrong place? Use PubMed!

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

1 o Semestre 2007/2008

Search and Information Retrieval

MEDICAL INFORMATICS. Sorana D. Bolboacă

MEDLINE (via Ovid): Introduction to Searching

Review: Information Retrieval Techniques and Applications

Assessing the effectiveness of medical therapies finding the right research for each patient: Medical Evidence Matters

Linear Algebra Methods for Data Mining

Latent Semantic Indexing with Selective Query Expansion Abstract Introduction

Medical School Objectives Project: Medical Informatics Objectives. I. Introduction

Chapter 1: The Cochrane Library Search Tour

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Text Analytics. Introduction to Information Retrieval. Ulf Leser

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Advas A Python Search Engine Module

What is Medical Informatics? Sanda Harabagiu Human Language Technology Research Institute DEPARTMENT OF Computer Science

Elsevier ClinicalKey TM FAQs

AHE 233 Introduction to Health Informatics Lesson Plan - Week One

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Term extraction for user profiling: evaluation by the user

CINAHL (via Ovid): Introduction to Searching

Data Pre-Processing in Spam Detection

Goals, Objectives, and Competencies for Reference Service:

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

ProteinQuest user guide

Medical Informatics: An Introduction

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

The Value of Indexing

Figure A Partial list of EBSCOhost databases

Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Towards a Visually Enhanced Medical Search Engine

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

LIBRARY GUIDE & SELECTED RESOURCES FOR NURSING. To find books about and by nursing theorists:

Information Need Assessment in Information Retrieval

TF-IDF. David Kauchak cs160 Fall 2009 adapted from:

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Information Retrieval System Assigning Context to Documents by Relevance Feedback

Inmagic Content Server Workgroup Configuration Technical Guidelines

Homework 2. Page 154: Exercise Page 145: Exercise 8.3 Page 150: Exercise 8.9

Chapter 2 The Information Retrieval Process

Enhancing Document Review Efficiency with OmniX

Searching and selecting primary

Databases and computerized information retrieval

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Where to search for books and articles. Accessing and Searching Avery Index

California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task

Introduction to Information Retrieval

1.00 ATHENS LOG IN BROWSE JOURNALS 4

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department

EMBASE (via Ovid): Introduction to Searching

Eng. Mohammed Abdualal

HELP DESK SYSTEMS. Using CaseBased Reasoning

Concepts of digital forensics

Physical Data Organization

SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON

Information Retrieval Elasticsearch

Document Management System (DMS) Release 4.5 User Guide

Review Easy Guide for Administrators. Version 1.0

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

Cerner i2b2 User s s Guide and Frequently Asked Questions. v1.3

Searching for Clinical Prediction Rules in MEDLINE

EBSCO. * While the search pages might look a little different depending on which database you are using, it will largely operate the same.

A Review on Document Retrieval from Unstructured Text

Machine Data Analytics with Sumo Logic

Semantic Search in Portals using Ontologies

Inmagic Content Server v9 Standard Configuration Technical Guidelines

Transcription:

Medical Information-Retrieval Systems Dong Peng Medical Informatics Group

Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval system. Future challenge in medical information retrieval.

Evolution of Medical Information Retrieval 1879: Index Medicus by Dr John Shaw Billings Journal articles were indexed by author name and subject headings and aggregated into bound volumes. 1966: the Medical Literature Analysis and Retrieval System (MEDLARS) by the National Library of Medicine (NLM) Predecessor of Medline. Stores only abstracted information from each article, e.g. author names, article title, journal source, publication date due to limited computing power and disk storage. Assign to each article a number of terms from its MeSH thesaurus. 1980s: full-text databases emerges due to growing computing power and less cheaper disk storage. Early 1990s: With the advent of the World Wide Web and the exponentially increasing power of computers and networks, vast quantities of medical information from multiple sources were available over the global internet.

The Information-Retrieval Process Framework comes from a modification of ideas advanced by Gerard Salton (1983). Indexing Content Indexing Database (content+index) Information Need Query Query formulation Query formulation Retrieval Retrieval Evaluation and Refinement Results Evaluation and refinement

Indexing Objective of indexing: to produce the smallest, most efficient representation of the original content. Content can be classified by the degree it represents the original resources, i.e. bibliographic content and full-text content. Indexing of bibliographic content Medline is the world s premiere bibliographic information source developed by NLM. It contains references to 10 million biomedical journal articles published in 3,500 journals since 1966. Multiple-indexed source. Example: search keyword heart attack Information abstracted from the publication, e.g. authors names, article title, publication date. Information added by a human indexer, e.g. subject headings (MeSH), publication type. MeSH Developed by NLM to represent important concepts in biomedicine. A collection of 18,000 subheadings grouped into 16 trees.

Indexing Indexing of full-text content No terms are assigned by human indexer. Term frequency and term locations are used in the indexing method. The most common method Automated Indexing (vector-space model) pioneered by Salton in the 1960s but only achieve widespread use till the 1990s. Document preprocessing: 1. extract unique words. 2. eliminate non-content-bearing stop words, e.g. a, an, the 3. stem word to their roots. 4. count the number of occurrence of each word in each document. Assigning weight to each word to enhance discrimination between various document vectors. Inverse Document Frequency Term Frequency weight TF ij ij = TF ij IDF i # of documents IDF i = log( ) + 1 # of documents with term i = log( frequency of term i in document j) + 1

Query Formulation Objective: to translate an information need into a high quality query. Boolean query and Natural language query. Boolean Queries Boolean operators: AND, OR, NOT. Wildcard characters: # (end with), : (start with). Field qualification for multiple indexed databases (e.g. author, subject heading). Term explosion is allowed by most Medline systems (e.g. Hypertension, Malignant). Text-word searching is allowed by modern Medline systems, i.e. words searched through the title and the abstract. Natural Language Queries Syntactic: removal of stop words, identification of common phrases, stemming of words to their roots. Semantic: identifying a word s semantic type (e.g. diagnosis, treatment), expanding the query to include synonyms of the entered terms, identifying indexes in which individual words should be searched.

Retrieval Matching Queries are compared against index and a result set is created. A simple example: an inverted-indexing database for a Boolean IR system: Item Aspirin Attribute (Doc. No.) 1,5,6,9 Attack 3,6,7,8 Heart 4,6,7,10 Prevention 1,2,6,9 Techniques for implementing Set in data storage Bit vector: set bit i to 1 if a term appears in document i, else 0. E.g. 20 documents, Heart_Set = 00010110001000000000 Very space consuming. Hashing: store document number into a hash table for each term. A hash function h(k) maps a document number k into a location in the hash table. E.g. let hash table size = 5, h(k) = k mod 5, 10 6 7 4 Heart_Set is mapped into the table as 0 1 2 3 4

Retrieval Set Data Hashing 0 1 2 3 4 1 2 8 32 0 1 2 3 4 5 6 7 8 9 1 2 32 4 64 16 8 4 64 (a) Chained hashing (b) Open addressing hashing Minimal Perfect Hashing Function (MPHF): uniquely transform each key into a location in the hash table without collision. gperf: a software tool which automates the generation of perfect hashing functinos.

Retrieval Ranking Chronology Alphabetic Relevance ranking: common in natural language IR systems. similarity where ( Q, D ) = t i = 1 ( w iq w t t 2 ( w iq ) i = 1 i = 1 0.5 freq iq w iq = (0.5 + ) max freq q ij ) ( w ij IDF i ) 2 Display

Evaluation Focusing on measuring the retrieval of relevant documents. Two parameters: recall and precision. # documentsretrieved and relevant recall = # relevantdocumentsin database analogous # documentsretrieved and relevant precision= # numberof documentsretrieved Why low recall and precision? From indexing Sensitivity Positive predictive value Inconsistency in human indexers. Problems with word-based indexing: context, polysemy, synonymy, content. From retrieval E.g. excess AND in queries

Trend in Medical Information Retrieval Systems Aggregation of content from diverse information sources. Example: MedWeaver @ http://www.unboundmedicine.com/medweaver.htm A web-based application that integrates functions from 3 systems: Advantages: DXplain: a computer-based decision support system, to provide a summary description of a disease and possible diagnoses for the disease when a patient s clinical findings are entered into the system. Assisted literature searches of MEDLINE. Links to clinically relevant web sites. Allow needs-based rather than source-based system design. Content creation and maintenance are distributed across organizations.

Future Challenge in Medical Information Retrieval Clinicians need high-quality, trusted information in the delivery of health care. Outdated information need to be archived dynamically. Information must be organized and indexed effectively for easy retrieval, to increase recall and precision of information retrieval. Information must be delivered on a platform that is convenient and reliable. More and more IR system will become web-based.

References Edward H. Shortliffe, Leslie E. Perreault et al, Medical informatics : computer applications in health care and biomedicine, New York, 2000. Edward H. Shortliffe, Leslie E. Perreault, et al, Medical informatics : computer applications in health care, Addison-Wesley Pub. Co, c1990. Van Rijsbergen, C. J., Information retrieval, London ; Boston : Butterworths, 1979. Fielden, Ned L., Internet research : theory and practice, Jefferson, N.C. : McFarland & Co., c2001.