Improving Health Records Search Using Multiple Query Expansion Collections

Similar documents
Barriers to Retrieving Patient Information from Electronic Health Record Data: Failure Analysis from the TREC Medical Records Track

Query Log Analysis of an Electronic Health Record Search Engine

Travis Goodwin & Sanda Harabagiu

BiTeM group report for TREC Medical Records Track 2011

Identifying Patients for Clinical Studies from Electronic Health Records: TREC Medical Records Track at OHSU

Using Discharge Summaries to Improve Information Retrieval in Clinical Domain

Overview of the TREC 2012 Medical Records Track

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task

Retrieving Medical Literature for Clinical Decision Support

Making Sense of Physician Notes: A Big Data Approach. Anupam Joshi UMBC joshi@umbc.edu Joint work with students, UBMC Colleagues, and UMMS Colleagues

ProteinQuest user guide

Term extraction for user profiling: evaluation by the user

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Using Health Information Technology to Improve Quality of Care: Clinical Decision Support

Medical-Miner at TREC 2011 Medical Records Track

Improving Web Page Retrieval using Search Context from Clicked Domain Names

Ancillary Comparison Charts

39. Supplemental Data

How to stop looking in the wrong place? Use PubMed!

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Performance Measurement for the Medicare and Medicaid Eligible (MME) Population in Connecticut Survey Analysis

Medicare 2015 QI Program Evaluation

How To Retrieve Similar Cases From A Medical Record

Electronic Health Record (EHR) Data Analysis Capabilities

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Secondary Use of Clinical Data from Electronic Health Records: The TREC Medical Records Track

UMass at TREC WEB 2014: Entity Query Feature Expansion using Knowledge Base Links

Anterior Hip Replacement

DURATION OF HEARING LOSS

Introduction to Public Health Informatics

Find the signal in the noise

Evaluation of Retrieval Systems

HCIM ICD-10 Training Online Course Catalog August 2015

Results from the Commonwealth Fund s State Scorecard on Health System Performance Kansas in comparison to Iowa

HIEs as an ACO Infrastructure & Results ICA has Experienced in HIE Deployment

Y O U R S U R G E O N S. choice of. implants F O R Y O U R S U R G E R Y

Disease/Illness GUIDE TO ASBESTOS LUNG CANCER. What Is Asbestos Lung Cancer? Telephone

PATIENT CONSENT TO PROCEDURE - ROUX-EN-Y GASTRIC BYPASS

Using Exploration and Learning for Medical Records Search: An Experiment in Identifying Cohorts for Comparative Effectiveness Research.

TEMPER : A Temporal Relevance Feedback Method

How To Write A Grant For A Health Information Technology Program

See page 331 of HEDIS 2013 Tech Specs Vol 2. HEDIS specs apply to plans. RARE applies to hospitals. Plan All-Cause Readmissions (PCR) *++

Using Big Data to Advance Healthcare Gregory J. Moore MD, PhD February 4, 2014

Application For Admission To The Non-Surgical Spinal Decompression Program At The Spinal Decompression Center of Long Beach

SGRP 113 Objective: Use clinical decision support to improve performance on high priority health conditions

Liver Disease & Hepatitis Program Providers: Brian McMahon, MD, Steve Livingston, MD, Lisa Townshend, ANP. Primary Care Provider:

ICD-10 Codes Utilized by Audiologists

ICD-10 Coding for Audiology

2. SYSTEM ARCHITECTURE CLINICAL DECISION SUPPORT TRACK 1. INTRODUCTION

Query term suggestion in academic search

Big Data Analytics- Innovations at the Edge

Your Practice Online

1MFBTF GJMM PVU GPSNT BOE GBY 'PSNT XJMM CF TJHOFE BU ZPVS BQQPJOUNFOU

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Find your future in the history

Hearing Loss in Geriatric Primary Care Mary Ann Forciea MD Josh Uy MD

The Electronic Health Record as a Clinical Study Information Hub

Information Discovery on Electronic Medical Records

Introduction to ICD-10-CM. An Introduction to the Transition from ICD-9-CM to ICD-10-CM

By H.S. Hyman, JD, MBA, of University of South Florida, IS/DS Department, and Warren Fridy III of Nova Southeastern University

Dental Admission Form

YOUR GUIDE TO TOTAL HIP REPLACEMENT

Stage 1 Meaningful Use for Specialists. NYC REACH Primary Care Information Project NYC Department of Health & Mental Hygiene

WPS Medicare Part B - Quarterly CERT Error Findings Report ~ MICHIGAN ~

Atigeo at TREC 2014 Clinical Decision Support Task

Elsevier ClinicalKey TM FAQs

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Search and Information Retrieval

6/14/2010. Clinical Decision Support: Applied Decision Aids in the Electronic Medical Record. Addressing high risk practices

Author Gender Identification of English Novels

How To Determine Pad

Health Information Form for Adults

A Patient s Guide to Primary and Secondary Prevention of Cardiovascular Disease Using Blood-Thinning (Anticoagulant) Drugs

The What, When, Where and How of Natural Language Processing

Medical Information-Retrieval Systems. Dong Peng Medical Informatics Group

A Semantic Platform for Information Retrieval from E-Health Records

Article details. Abstract. Version 1

HIE Ready 2.0 SPECIFICATIONS MATRIX. Product Name: Version Number: Preferred Message and Trigger

Hearing Loss A growing problem that affects quality of life

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

2FORMATS AND CONVENTIONS

Technology Assisting Cancer Outcomes: Automated Biomarker Abstraction Overcoming Textual Data-Silos

Population Health Management A Key Addition to Your Electronic Health Record

Advance Notification Requirements for New York Effective June 1, 2015

CENG 734 Advanced Topics in Bioinformatics

HEDIS/CAHPS 101. August 13, 2012 Minnesota Measurement and Reporting Workgroup

The Independent Order Of Foresters ( Foresters ) Critical Illness Rider (Accelerated Death Benefit) Disclosure at the Time of Application

Query Log Analysis of an Electronic Health Record Search Engine

NCQA Health Insurance Plan Ratings Methodology March 2015

APPENDIX I-A: INFORMED CONSENT BB IND Protocol CDC IRB #4167

Predictive Coding Defensibility and the Transparent Predictive Coding Workflow

What You Need to Know About Behavioral Health Care Services

SOCIAL WORK RESEARCH ON INTERVENTIONS FOR ADOLESCENT SUBSTANCE MISUSE: A SYSTEMATIC REVIEW OF THE LITERATURE

IMS Allergy & Immunology New Patient Registration Sheet. Personal Information

CHAPTER 2. Neoplasms (C00-D49) March MVP Health Care, Inc.

What is Medical Informatics? Sanda Harabagiu Human Language Technology Research Institute DEPARTMENT OF Computer Science

Butler Memorial Hospital Community Health Needs Assessment 2013

Patient Registration Form

Transcription:

Improving Health Records Search Using Multiple Query Expansion Collections Dongqing Zhu and Ben Carterette Information Retrieval Lab University of Delaware

Outline Introduction Retrieval Task & Data Retrieval Models Results & Analysis Conclusion

Introduction Medical Record A longitudinal record of patient health information generated by one or more encounters in any care delivery setting. Adoption of EMR < 10% of hospitals as of 2006 had a fully integrated system. However, the CDC reported that the EMR adoption rate had steadily risen to 48.3 percent at the end of 2009.

Electronic Medical Records Applications Support clinical studies & improve quality of care Examples To assess the efficacy of antidepressants in treating back pain in adults To examine the benefits and harms of aspirin chemoprevention To study the relationship between HCV and kidney disease Need to gather evidence from a sample group of patients that all have a particular condition. information retrieval, natural language processing, etc. There are a few challenges though.

Challenges Clinical Narratives Full of synonymy Epigastric pain after eating & postprandial stomach discomfort tobacco user & smoker Abbreviation PCP stand for the drug phencyclidine, the disease Pneumocystis carinii pneumonia, or an individual, the primary care physician Usage of words Yang et al. [1] analyzed a query log of EMERSE over the course of 4 years, and found that the coverage of EHR query terms by a meta-dictionary is much lower than the usual 85-90% coverage of Web queries by English dictionaries. Thus, they suggested seeking beyond the use of medical ontologies to enhance medical information retrieval. [1] L. Yang, Q. Mei, K. Zheng, and D. Hanauer, Query log analysis of an electronic health record search engine, in AMIA Annual Symposium Proceedings, 2011, pp. 915 924.

Information Retrieval for EMR search Medical records track Organized at the 2011 & 2012 Text REtrieval Conference (TREC) which is co-sponsored by National Institute of Standards and Technology (NIST) and US Depart. of Defense (DoD) Goal: to foster research on providing content-based access to the free-text fields of electronic medical records Task: ad hoc search task for retrieving relevant patient visits to the hospital, which models the real-world task of finding a population over which comparative effectiveness studies can be done

Data TREC official test collection Corpus > 100,000 reports from ~18,000 patient hospital visits collected over a month From the Univ. of Pitts BLULab NLP Repository Topics 35 topics (i.e., queries) developed by physicians Sample topics Patients with ductal carcinoma in situ (DCIS) Adults who received a coronary stent during an admission Women with osteopenia Relevance Judgments Collected from physicians

A Sample EMR Radiology History and Physical Consults Emergency Room Progress Notes Discharge Summaries Operative Reports Surgical Pathology Reports Cardiology Reports CONTUSION OF HIP OTHER PERSISTENT MENTAL DISORDERS DUE TO CONDITIONS CLASSIFIED ELSEWHERE This is a **AGE[50]-year-old male patient. EXAMINATION PERFORMED: PELVIS AP VIEW ONLY **DATAE[Feb 04 07] 1343 HOURS CLINICAL HISTORY: Fall. FINDINGS: Frontal views of the pelvis and specific oblique views of the right hip show no fractures of dislocations. There is a protrusio acetabulum on the right with chronic deformity of the acetabular margin and adjacent femoral head with joint space narrowing. Vessel calcification is evident. He has had no fever or chills His My signature below is attestation that I have interpreted this/these examination(s) and agree with the findings as noted above.... Three Medical Features 1. Code Expansion 2. Negation Removal 3. Age/Gender Filtering

Retrieval Models Our focus -- statistical IR models Query likelihood model (baseline) Relevance model Mixture of relevance models Two novel weighting methods Medical thesaurus-based expansion Adaptive collection weighting method

Retrieval Models Baseline model Query likelihood model with Dirichlet Smoothing Full independence between query terms D q 1 q 2 q 3 Q : hearing loss q1: hearing, q2: loss D (hearing, p = 0.0018) (loss, p = 0.0031) (health, p = 0.0004) (temperature, p = 0.002) Document: states that they did have a Miracle Ear representative check into the hearing aids who was

Retrieval Models Lavrenko s relevance model w 1 w 2 w 3 R Original query terms: hearing, loss Expansion query terms: Cochlear, ear, deaf, noise, sensorineural, fechter, binaural, efferent, monaural, loud, coch, Expansion Collection C Maximum Likelihood Estimation

Retrieval Models Mixture of relevance model Using multiple collections for constructing the relevance models Linear interpolating it with Maximum-likelihood query estimate Maximum Likelihood Estimation weighted relevance models

Retrieval Models Expansion Collections Used for building relevance models 13

Retrieval Models Expansion Collections MeSH --- a medical thesaurus containing relations between medical concepts represented in a tree structure Other collections contain full-text articles/webpages Expansion Methods Medical thesaurus-based expansion General expansion 14

Medical Thesaurus-based Expansion Four steps Concept identification: use PubMed e-utility to identify MeSH concepts in the query Concept expansion: expand a detected MeSH concept by its entry terms and decedent nodes down level l in the MeSH trees Concept weighting: for each MeSH concept, estimate weight p for e using a PubMed query log Concepts aggregation: merge lists of expansion terms for each concept into one final expansion list

Medical Thesaurus-based Expansion Concept expansion & weighting Deafness Deaf-Blind Disorders Hearing Loss Entry terms Hearing Loss, High-Frequency Presbycusis Hypoacusis Hearing Impairment Hearing Loss, Sensorineural Usher Syndromes # of users whose queries contain e_i in query log G [2] Modeling term proximity: #uw16(#1(hearing loss) sensorineual) Phrasal expansion term: #1(usher syndromes) [2] J. R. Herskovic, L. Y. Tanaka, W. R. Hersh, and E. V. Bernstam, A day in the life of PubMed: Analysis of a typical day s query log. JAMIA, vol. 14, no. 2, pp. 212 220, 2007.

Three Steps General Expansion Initial retrieval in an expansion collection using original Q Take the k top-ranked documents Select m good terms for query expansion Expanded query example

Query Expansion Adaptive Collection Weighting Estimate the effectiveness of an expansion collection by measuring the similarity between Q E and Q Q (the smoothed unigram language models built for expansion query E and original query Q respectively based top retrieved documents from the target collection) Jensen-Shanon divergence (JSD) for similarity measure

Summary of Retrieval Models Three statistical IR models Query likelihood model (baseline) Relevance model Mixture of relevance models Two novel weighting methods Medical thesaurus-based expansion Adaptive collection weighting method

Evaluation Measures P10 precision at rank 10, which measures the proportion of relevant documents among the top 10 retrieved MAP mean average precision, which provides a single-figure measure of quality across recall levels bpref computes a preference relation of whether judged relevant documents are retrieved ahead of judged irrelevant documents

Results and Analysis MeSH expansion Single expansion Multiple expansion

MeSH Expansion 5 fold cross validation results Summary Our expansion term weighting method brings significant improvement over all other unweighted versions as well as the baseline: we see nearly 12% improvement over the baseline, and 5-7% over the unweighted version. Increasing expansion level l only slightly improves the retrieval effectiveness.

Single Expansion 5 fold cross validation results Collection size: ImageCLEF < Medical < Genomics < Wikipedia < ClueWeb09

Single Expansion Summary Significantly better than baseline Expansion effectiveness Quality (i.e., content similarity to the target collection) Collection size MeSH expansion Relies on a controlled vocabulary Expansion terms derived are not as diversified as those from a general expansion collection However, it rarely introduce bad expansion terms

Multiple Expansion 5 fold cross validation results Summary Using multiple expansion collections > using just a single collection adaptive collection weighting method significantly improves the MAP score

Cross Validation Results Overall performance Systems MAP P10 bpref Baseline 0.353 0.506 0.469 MedSearch (WMP+3 medical features) 0.457 (+30%) 0.612 0.583 NLMManual 0.507 0.658 0.727 Summary MedSearch improves the baseline by about 30% on MAP Performance close to a manual system

Conclusions Two novel weighting methods Medical concept weighting Expansion collection weighting Both significantly improve retrieval effectiveness Expansion collections Comparison & Analysis Insights on selecting good expansion collections A health record search system Improves a strong baseline by about 30% on MAP Presents a promising overall performance when compared with a manual system doing the same task

Q & A Thanks!