How To Retrieve Similar Cases From A Medical Record

Transcription

1 Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs Laura Plaza and Alberto Díaz Universidad Complutense de Madrid C/Profesor José García Santesmases, s/n, Madrid 28040, Spain Abstract. Physicians often use information from previous clinical cases in their decision-making process. However, the large amount of patient records available in hospitals makes an exhaustive search unfeasible. We propose a method for the retrieval of similar clinical cases, based on mapping the text onto UMLS concepts and representing the patient records as semantic graphs. The method also deals with the problems of negation detection and concept identification in clinical free text. To evaluate the approach, an evaluation collection has been developed. The results show that our method correlates well with the expert judgments and outperforms remarkably the traditional term-vector space model. Keywords: Similar Case Retrieval, Graph Theory, Electronic Health Record, UMLS. 1 Introduction In their daily work, physicians often face complex clinical cases and need to refer to similar patient cases encountered before, especially when untypical cases are presented. In fact, as stated in [1], previous knowledge about the problem is proved to be one of the most influencing factors in the decision-making process. However, as shown in [2], the excessive time required to find this knowledge may undermine its convenience if no effective access technologies are provided. When dealing with medical records, the information is mainly stored as free-text. This information is considerably more difficult to analyze than that presented in more formal texts, such as textbooks or scientific papers, since it exhibits unique sublanguage characteristics (e.g. verbless sentences, lack of punctuation and spelling errors). Moreover, negation detection plays an important role when trying to understand the meaning of medical information. The task of retrieving similar medical records may be considered a particular case of Information Retrieval (IR) that may be stated as: Given a reference record, retrieve other records from the clinical database that are similar to the reference one. To decide the criteria to assess the similarity between records is the most important and difficult issue in this task. In this paper, we propose a method for the retrieval of similar Electronic Health Records (EHRs), based on mapping the text onto UMLS concepts and representing the patient records C.J. Hopfe et al. (Eds.): NLDB 2010, LNCS 6177, pp , c Springer-Verlag Berlin Heidelberg 2010

2 Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs 297 as semantic graphs, where the vertices are UMLS concepts and the edges represent is-a relations between them. The method also deals with the problems of negation detection and concept identification in clinical free text. 2 Background 2.1 Related Work Information Retrieval in medicine dates back to the middle 1960s. Early approaches use words to index the documents in the corpus. However, though term-based indexing using a vector space model is simple and powerful, conceptbased indexing using controlled terminologies can improve the performance of IR. A pioneer work on IR in the medical context which makes use of these resources is SAPHIRE [3]. The aim of this project was the development of methods for indexing and retrieving medical documents from bibliographic databases. Most NLP works in the biomedical domain have focused on journal articles rather than clinical free text. A recent initiative for the development of NLP of medical records is the CLEF forum; in particular, the ImageCLEF track 1.The goal is to retrieve the most relevant images to a predefined set of queries, based on their clinical case descriptions. The best results in 2005 and 2006 editions came from Lacoste et al. [4], who applied a semantic indexing to identify key concepts in certain UMLS semantic categories. Other systems that competed in the task made use of other resources, such as MeSH, to expand the queries [5]. Apart from these few approaches, most medical IR systems do not use any external resource, but base their decisions on the terms in the text, and only a few of them use ad hoc lists or manual tagging to determine which of these terms are symptoms, diseases or other relevant features [6]. Ontologies and controlled vocabularies may avoid the need of producing and maintaining these lists, while capturing the semantic in the text and the relations between the terms. 2.2 The Electronic Health Record and the Domain Peculiarities Clinical records present unique attributes that must be taken into account in any NLP system. First, the structure and content vary with the audience needs. It is a mix of highly structured information and other idiosyncratic narrative text. Sometimes, images are included. Finally, medical records differ greatly in size: from a few lines to several pages. In summary, any information that is relevant for the decision process can be part of the medical record, so that it is either impossible or surprisingly difficult to find a predictable retrieval pattern. Second, negation detection is a fundamental issue, since it can invert the sense of a text. Negation in natural language can be extremely subtle, but medical language is much more restricted and negations are expected to be more direct [7]. A variety of approaches have addressed negation in medical texts [8,9]. 1 ImageCLEF medical retrieval task,

3 298 L. Plaza and A. Díaz Third, the peculiarities of the terminology and the writing practices of physicians make concept detection a very ambitious task [10]. The first challenge is the problem of synonyms and homonyms. Another handicap is the presence of neologisms. Finally, elisions and abbreviations complicate the automatic disambiguation of medical text. 2.3 The Use of UMLS for Concept Annotation One of the most popular biomedical terminologies in NLP applications is the Unified Medical Language System (UMLS) [11]. UMLS consists of 3 main components: the Specialist Lexicon, the Metathesaurus and the Semantic Network. The Metathesaurus comprises a collection of biomedical concepts derived from more than 100 vocabulary sources and the relations among them. The Semantic Network consists of a set of categories (semantic types) that provides a categorization of the concepts in the Metathesaurus. Using UMLS for concept annotation presents two main advantages: first, it lists more than entries of ambiguous terms (which attenuate the problems of synonymy and homonymy); second, it contains numerous entries for elisions and abbreviations. In order to map the text onto UMLS concepts, the MetaMap Transfer tool (MMTx) is used. MetaMap [12] allows mapping biomedical free-form text to Metathesaurus concepts with a high level of accuracy [13]. 3 A Method for Automatic Retrieval of Similar EHRs In this section, a concept graph-based method for retrieving similar EHRs is presented. It consists of 4 steps. Each step is discussed in the following subsections. Besides, in order to clarify how the algorithm works, the following radiology report from the CMC-NLP corpus 2 is used as working example: CLINICAL HISTORY: Eleven years old with ALL, bone narrow transplant on Jan.2, now with three day history of cough. IMPRESSION: No focal pneumonia. Likely chronic changes at the left lung base. Mild anterior wedging of the thoracic vertebral bodies. 3.1 Extraction of UMLS Concepts In this step, the text in each EHR from the database is mapped onto UMLS concepts and semantic types using MetaMap. In order to understand what information is relevant for the retrieval of similar clinical cases, a hospital physician has been consulted. The decision apparently depends on multiple criteria (e.g. the physician s specialty and particular concerns). However, as general guidelines, two clinical records can be considered similar if: (1) the same symptom or 2 Computational Medicine Center s 2007 Medical Natural Language Processsing Challenge (CMC-NLP 2007),

4 Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs 299 Table 1. Relevant UMLS semantic types Category UMLS semantic types Sign or Symptom Symptoms and signs Finding Diseases Disease or Syndrome Pathologic Function Procedures Therapeutic or Preventive Procedure Diagnosis Procedure Body parts Body Location or Region Body Part, Organ, or Organ Component Medicaments Pharmacologic substance sign is presented (e.g. fever or 5 kg weight loss), (2) the patients have received the same diagnosis (e.g. bacterial pneumonia), (3) the same test or procedure is reported (e.g. cerebral NMR or endoscopy biopsy) or (4) the same medicament has been administered (e.g. clopidogrel). Therefore, accordingto the domain expert guidelines relevant attributes to the task are: symptoms or signs, diseases, procedures and medicaments. Identifying which concepts in a medical record correspond to each of these categories is not trivial. However, the UMLS Semantic Network includes very useful information to map the concepts in the patient records to the previous categories. Then, only the concepts from a subset of UMLS semantic types are considered. Table 1 shows these semantic types along with their mappings to the categories above. A further category has been added, body parts, since they are often involved in the descriptions of procedures and diseases (e.g. fractured rib). 3.2 Negation Detection The aim of this step is to detect negated concepts in the EHRs since, according to the domain expert, absent symptoms or diseases are not relevant for the task. Since negations within medical records usually appear in a reduced number of forms, a simple lexical scanner from regular expressions is used. According to this, we have come up with 4 negation classes. Table 2 shows the lexical patterns used to detect them, along with some examples of their occurrences within the corpora. Concept stands for the previously identified UMLS concepts. 3.3 Semantic Graph Representation The next step consists of creating a graph-based representation for each EHR in the database. First, the concepts identified in the previous step are retrieved from the UMLS Metathesaurus along with their complete hierarchy of hypernyms. Second, all concept hierarchies for each category are merged, building a unique graph for each category in the EHR, where the edges represent semantic relations, and the vertices represent distinct concepts in the text. Finally, each concept is assigned a weight using equation (1), where α is the set of all the parents of the concept A, including A, and β is the set of all the parents of

5 300 L. Plaza and A. Díaz Table 2. Negation classes Lexical Pattern Examples no without rule out + adj? + concept + (or concept)* Without focal scarring Rule out fever or cough no without + noun + of + concept + (or concept)* No signs of tuberculosis No evidence of hydroureter evaluate for + (noun adj)? + concept + (or concept)* Evaluate for foreign body Evaluate for abnormalities Lack of kyphosis lack absence of + (noun adj)? + concept + (or concept)* Absence of heart murmur concept B, including B. This formula will assign greater weight to an edge as the concepts that it links become more specific. Fig. 1 shows these graphs for the record example. The edges of one of these graphs have been labeled with their weights. It can be observed how the acronym ALL (Acute Lymphocytic Leukemia) is expanded by MetaMap. weight(a, B) = α β α β (1) Fig. 1. An example of EHR semantic representation 3.4 Computing the Similarity between EHRs The purpose of the last step is to compute the similarity between the graphbased representations of two patient records. To this end, a non-democratic vote

6 Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs 301 mechanism is used, similar to that proposed in [14]. Given two graphs, A and B, so that the similarity of A to B has to be measured, each concept of A which is present in B assigns a score equal to the weight of that concept in the graph A, and 0 otherwise. Next, the sum of the scores for all concepts in A is computed. This result is normalized in the interval [0, maximum similarity]. Finally, the comparison between the representations of the patient records is accomplished by computing the similarity between the hierarchies of the graphs for each category and adding these partial results to calculate a global similarity value. 4 Evaluation Methodology To the authors knowledge, no corpus of relevance judgments of medical record similarity is publicly available. For this reason, a collection of radiology reports from the Cincinnati Children s Hospital Medical Center s Department of Radiology is used. This corpus was designed for being used in the CMC-NLP 2007 ICD-9-CM categorization task. To evaluate the method, 50 reports have been obtained from this corpus, using a stratified sampling, so that for each ICD category, a number of reports proportional to its size are selected. To avoid very small categories, only ten categories have been used. We will refer to this set as our test collection. Next, a subset of 20 reports has been separated from the test collection, once again using a stratified sampling. We will refer to this set as our query collection. Two hospital physicians were asked to select, for each record in the query collection, the most similar ones within the test collection. To measure the interjudge agreement, the Kappa statistic [15] has been calculated. An average kappa value equal to is obtained, which indicates that there is a substantial (0.61<= k <= 0.80) agreement. The evaluation is done by comparing the EHRs retrieved by the system which those retrieved by the experts and calculating precision and recall metrics. As the method s output is a ranking of EHRs, two different evaluation approaches are used: first, the number of records to retrieve is set to 3 and 5 respectively; and second, the average precision at all different levels of recall is calculated [16]. Since the relevance judgments differ across judges, two experiments are presented: first, only overlapping judgments are considered; second, the union of the judgments is used. 5 Results and Discussion In order to verify if the method proposed yields better results than the classic term vector space model, we compare the retrieval precision and recall obtained by both methods. We first examine the results of our algorithm when 3 and 5 document cutoffs are used (Table 3: Union-3, Intersection-3, Union-5, Intersection-5). Note that the order of documents in the ranking is irrelevant. Our method significantly outperforms the vector space model in precision and recall, both with the union and the intersection of the relevance judgements. It can be also observed that

7 302 L. Plaza and A. Díaz when the 5 document cutoff is used, the precision in our method considerably decreases. The reason is that we have forced the system to retrieve 5 documents when the experts have retrieved only documents per query on average. On the contrary, precision in the vector space model slightly increases with the number of documents retrieved. The reason is that a good number of the relevant documents retrieved are not ranked in the top 3 positions, but in positions 4 and 5. As expected, recall increases with the number of documents retrieved in both systems. Table 3. Results of the evaluation and comparison with the vector space model Graph-based Term-based Precision Recall F-measure Precision Recall F-measure Union Intersection Union Intersection Union Intersection We next examine the retrieval performance when precision at all levels of recall is considered (Table 3: Union, Intersection), so that the position of the relevant documents in the ranking does matter now. Once again our algorithm obtains a significant improvement in precision over the term-based approach. 6 Conclusion and Future Work In this paper, a novel approach to the automatic retrieval of similar EHRs has been presented. The method represents the medical record as a set of semantic graphs using UMLS concepts and relations. This way it gets a richer representation than the one provided by traditional models based on terms. The method achieves relatively high precision and recall, which are also well balanced, which indicates that even though some relevant records are not ranked in the top positions, most retrieved documents are relevant. However, the intermediate results of the method have shown that the indexing is not sufficiently exhaustive, as UMLS occasionally fails to recover relevant concepts especially when these concepts are expressed in their shortened forms. Another important impairment to concept identification comes from the spelling errors that frequently occur in the clinical records. Future work will test the method on a different evaluation collection which ideally will present longer medical records structured in different sections, so that the position of the concepts in these sections could condition whether or not these concepts are relevant for indexing. Furthermore, a user-oriented evaluation is planned which attempt to assess the system from cost-effectiveness and benefit points of view.

8 Retrieval of Similar Electronic Health Records Using UMLS Concept Graphs 303 Acknowledgements This research has been partially funded by the Spanish Ministerio de Ciencia e Innovación (TIN C03-01) and by the Spanish Ministerio de Industria, Turismo y Comercio (TIS ). References 1. Gorman, P., Helfand, M.: Information seeking in primary care: how physicians choose which clinical question to pursuit and which to leave unanswered. Medical Decision Making 15, (1995) 2. Ely, J., Osheroff, J., Ebell, M., Chambliss, M., Vinson, D., Stevermer, J., Pifer, E.: Obstacles to answering doctors questions about patient care with evidence: qualitative study. British Medical Journal 324, (2002) 3. Hersh, W., Hickam, D.: Information retrieval in medicine: The Saphire experience. Journal of the American Society for Information Science 46, (1995) 4. Lacoste, C., Lim, J., Chevallet, J., Le, D.: Medical-image retrieval based on knowledge-assisted text and image indexing. IEEE Transactions on Circuits and Systems for Video Technology 17, (2007) 5. Navarro, S., Llopis, F., Muñoz, R.: Different Multimodal Approaches using IR-n in ImageCLEFphoto In: On-line Working Notes CLEF (2008) 6. Kwiatkowska, M., Atkins, S.: Case representation and retrieval in the diagnosis and treatment of obstructive sleep apnea: A semiofuzzy approach. In: Proceedings of the 7th ECCBR (2004) 7. Gindl, S., Kaiser, K., Miksch, S.: Syntactical negation detection in clinical practice guidelines (2008) 8. Mutalik, A., Deshpande, A., Nadkarni, P.: Use of general-purpose negation detection to augment concept indexing of medical documents. A quantitative study using the UMLS. JAIMA 8, (2001) 9. Morante, R., Liekens, A., Daelemans, W.: Learning the scope of negation in biomedical texts. In: Proceedings of the EMNLP Conference, pp (2008) 10. Nadkarni, P.: Information retrieval in medicine: overview and applications. Journal of Postgraduate Medicine 46, (2000) 11. Nelson, S., Powell, T., Humphreys, B.: The Unified Medical Language System (UMLS) Project. In: Kent, A., Hall, C.M. (eds.) Encyclopedia of Library and Information Science, Marcel Dekker, Inc., New York (2002) 12. Aronson, A.: Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In: Proceedings of the AMIA Annual Symposium, pp (2001) 13. Pratt, W., Yetisgen-Yildiz, M.: A study of biomedical concept identification: Metamap vs. people. In: Proceedings of the AMIA Annual Symposium, pp (2003) 14. Yoo, I., Hu, X., Song, I.Y.: A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method. BMC Bioinformatics 8(9) (2007) 15. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, (1960) 16. Salton, G.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)