Improving Health Records Search Using Multiple Query Expansion Collections

Improving Health Records Search Using Multiple Query Expansion Collections Dongqing Zhu and Ben Carterette Information Retrieval Lab University of Delaware

Outline Introduction Retrieval Task & Data Retrieval Models Results & Analysis Conclusion

Introduction Medical Record A longitudinal record of patient health information generated by one or more encounters in any care delivery setting. Adoption of EMR < 10% of hospitals as of 2006 had a fully integrated system. However, the CDC reported that the EMR adoption rate had steadily risen to 48.3 percent at the end of 2009.

Electronic Medical Records Applications Support clinical studies & improve quality of care Examples To assess the efficacy of antidepressants in treating back pain in adults To examine the benefits and harms of aspirin chemoprevention To study the relationship between HCV and kidney disease Need to gather evidence from a sample group of patients that all have a particular condition. information retrieval, natural language processing, etc. There are a few challenges though.

Challenges Clinical Narratives Full of synonymy Epigastric pain after eating & postprandial stomach discomfort tobacco user & smoker Abbreviation PCP stand for the drug phencyclidine, the disease Pneumocystis carinii pneumonia, or an individual, the primary care physician Usage of words Yang et al. [1] analyzed a query log of EMERSE over the course of 4 years, and found that the coverage of EHR query terms by a meta-dictionary is much lower than the usual 85-90% coverage of Web queries by English dictionaries. Thus, they suggested seeking beyond the use of medical ontologies to enhance medical information retrieval. [1] L. Yang, Q. Mei, K. Zheng, and D. Hanauer, Query log analysis of an electronic health record search engine, in AMIA Annual Symposium Proceedings, 2011, pp. 915 924.

Information Retrieval for EMR search Medical records track Organized at the 2011 & 2012 Text REtrieval Conference (TREC) which is co-sponsored by National Institute of Standards and Technology (NIST) and US Depart. of Defense (DoD) Goal: to foster research on providing content-based access to the free-text fields of electronic medical records Task: ad hoc search task for retrieving relevant patient visits to the hospital, which models the real-world task of finding a population over which comparative effectiveness studies can be done

Data TREC official test collection Corpus > 100,000 reports from ~18,000 patient hospital visits collected over a month From the Univ. of Pitts BLULab NLP Repository Topics 35 topics (i.e., queries) developed by physicians Sample topics Patients with ductal carcinoma in situ (DCIS) Adults who received a coronary stent during an admission Women with osteopenia Relevance Judgments Collected from physicians

A Sample EMR Radiology History and Physical Consults Emergency Room Progress Notes Discharge Summaries Operative Reports Surgical Pathology Reports Cardiology Reports CONTUSION OF HIP OTHER PERSISTENT MENTAL DISORDERS DUE TO CONDITIONS CLASSIFIED ELSEWHERE This is a **AGE[50]-year-old male patient. EXAMINATION PERFORMED: PELVIS AP VIEW ONLY **DATAE[Feb 04 07] 1343 HOURS CLINICAL HISTORY: Fall. FINDINGS: Frontal views of the pelvis and specific oblique views of the right hip show no fractures of dislocations. There is a protrusio acetabulum on the right with chronic deformity of the acetabular margin and adjacent femoral head with joint space narrowing. Vessel calcification is evident. He has had no fever or chills His My signature below is attestation that I have interpreted this/these examination(s) and agree with the findings as noted above.... Three Medical Features 1. Code Expansion 2. Negation Removal 3. Age/Gender Filtering

Retrieval Models Our focus -- statistical IR models Query likelihood model (baseline) Relevance model Mixture of relevance models Two novel weighting methods Medical thesaurus-based expansion Adaptive collection weighting method

Retrieval Models Baseline model Query likelihood model with Dirichlet Smoothing Full independence between query terms D q 1 q 2 q 3 Q : hearing loss q1: hearing, q2: loss D (hearing, p = 0.0018) (loss, p = 0.0031) (health, p = 0.0004) (temperature, p = 0.002) Document: states that they did have a Miracle Ear representative check into the hearing aids who was

Retrieval Models Lavrenko s relevance model w 1 w 2 w 3 R Original query terms: hearing, loss Expansion query terms: Cochlear, ear, deaf, noise, sensorineural, fechter, binaural, efferent, monaural, loud, coch, Expansion Collection C Maximum Likelihood Estimation

Retrieval Models Mixture of relevance model Using multiple collections for constructing the relevance models Linear interpolating it with Maximum-likelihood query estimate Maximum Likelihood Estimation weighted relevance models

Retrieval Models Expansion Collections Used for building relevance models 13

Retrieval Models Expansion Collections MeSH --- a medical thesaurus containing relations between medical concepts represented in a tree structure Other collections contain full-text articles/webpages Expansion Methods Medical thesaurus-based expansion General expansion 14

Medical Thesaurus-based Expansion Four steps Concept identification: use PubMed e-utility to identify MeSH concepts in the query Concept expansion: expand a detected MeSH concept by its entry terms and decedent nodes down level l in the MeSH trees Concept weighting: for each MeSH concept, estimate weight p for e using a PubMed query log Concepts aggregation: merge lists of expansion terms for each concept into one final expansion list

Medical Thesaurus-based Expansion Concept expansion & weighting Deafness Deaf-Blind Disorders Hearing Loss Entry terms Hearing Loss, High-Frequency Presbycusis Hypoacusis Hearing Impairment Hearing Loss, Sensorineural Usher Syndromes # of users whose queries contain e_i in query log G [2] Modeling term proximity: #uw16(#1(hearing loss) sensorineual) Phrasal expansion term: #1(usher syndromes) [2] J. R. Herskovic, L. Y. Tanaka, W. R. Hersh, and E. V. Bernstam, A day in the life of PubMed: Analysis of a typical day s query log. JAMIA, vol. 14, no. 2, pp. 212 220, 2007.

Three Steps General Expansion Initial retrieval in an expansion collection using original Q Take the k top-ranked documents Select m good terms for query expansion Expanded query example

Query Expansion Adaptive Collection Weighting Estimate the effectiveness of an expansion collection by measuring the similarity between Q E and Q Q (the smoothed unigram language models built for expansion query E and original query Q respectively based top retrieved documents from the target collection) Jensen-Shanon divergence (JSD) for similarity measure

Summary of Retrieval Models Three statistical IR models Query likelihood model (baseline) Relevance model Mixture of relevance models Two novel weighting methods Medical thesaurus-based expansion Adaptive collection weighting method

Evaluation Measures P10 precision at rank 10, which measures the proportion of relevant documents among the top 10 retrieved MAP mean average precision, which provides a single-figure measure of quality across recall levels bpref computes a preference relation of whether judged relevant documents are retrieved ahead of judged irrelevant documents

Results and Analysis MeSH expansion Single expansion Multiple expansion

MeSH Expansion 5 fold cross validation results Summary Our expansion term weighting method brings significant improvement over all other unweighted versions as well as the baseline: we see nearly 12% improvement over the baseline, and 5-7% over the unweighted version. Increasing expansion level l only slightly improves the retrieval effectiveness.

Single Expansion 5 fold cross validation results Collection size: ImageCLEF < Medical < Genomics < Wikipedia < ClueWeb09

Single Expansion Summary Significantly better than baseline Expansion effectiveness Quality (i.e., content similarity to the target collection) Collection size MeSH expansion Relies on a controlled vocabulary Expansion terms derived are not as diversified as those from a general expansion collection However, it rarely introduce bad expansion terms

Multiple Expansion 5 fold cross validation results Summary Using multiple expansion collections > using just a single collection adaptive collection weighting method significantly improves the MAP score

Cross Validation Results Overall performance Systems MAP P10 bpref Baseline 0.353 0.506 0.469 MedSearch (WMP+3 medical features) 0.457 (+30%) 0.612 0.583 NLMManual 0.507 0.658 0.727 Summary MedSearch improves the baseline by about 30% on MAP Performance close to a manual system

Conclusions Two novel weighting methods Medical concept weighting Expansion collection weighting Both significantly improve retrieval effectiveness Expansion collections Comparison & Analysis Insights on selecting good expansion collections A health record search system Improves a strong baseline by about 30% on MAP Presents a promising overall performance when compared with a manual system doing the same task

Q & A Thanks!