Pierre Zweigenbaum (Ed.), Consortium Menelas. DIAM INSERM U.194 and SIM, Service d'informatique Medicale, Assistance Publique { H^opitaux de Paris



Similar documents
Document Management in Healthcare: Presentation of the DOME project

RRSS - Rating Reviews Support System purpose built for movies recommendation

Building a Text Corpus for Representing the Variety of Medical Language

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Enriched document. MLP results. Document-centered EMR. segment and categorize text contents. Annotated corpus for NLP development

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

Standardization of Components, Products and Processes with Data Mining

Complementarity of Reminder-based and On-demand Decision Support According to Clinical Case Complexity

Practical Implementation of a Bridge between Legacy EHR System and a Clinical Research Environment

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

Towards Interoperability of Information Sources within a Hospital Intranet

CLINICAL INFORMATICS STUDY GUIDE MATERIALS EXAMINATION CONTENT OUTLINE OFFICE OF THE BOARD. 111 West Jackson, Suite 1340 Chicago, Illinois 60604

Ontology-based Archetype Interoperability and Management

A Knowledge Based Approach to Support Learning Technical Terminology *

Interpretative framework of chronic disease management to guide textual guideline GEM-encoding

Comprendium Translator System Overview

RESULTS OF HOSPITAL SURVEY

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

TRANSFoRm: Vision of a learning healthcare system

Semantic annotation of requirements for automatic UML class diagram generation

Extracted Templates. Postgres database: results

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

Visualizing WordNet Structure

Natural Language Dialogue in a Virtual Assistant Interface

The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology

Overview of MT techniques. Malek Boualem (FT)

A Case Study of Question Answering in Automatic Tourism Service Packaging

How To Understand The Difference Between Terminology And Ontology

Isabelle Debourges, Sylvie Guilloré-Billot, Christel Vrain

Semantic Search in Portals using Ontologies

Overview of the TACITUS Project

Extracting Business. Value From CAD. Model Data. Transformation. Sreeram Bhaskara The Boeing Company. Sridhar Natarajan Tata Consultancy Services Ltd.

Classification of Natural Language Interfaces to Databases based on the Architectures

Medical-Miner at TREC 2011 Medical Records Track

Odile BOIZARD EDUCATION

Design of a Multi Dimensional Database for the Archimed DataWarehouse

COCOVILA Compiler-Compiler for Visual Languages

Ontology and automatic code generation on modeling and simulation

Semantic interoperability of dual-model EHR clinical standards

CpSc810 Goddard Notes Chapter 7. Expert Systems

Search and Information Retrieval

INSTITUT FEMTO-ST. Webservices Platform for Tourism (WICHAI)

Interactions et collaboration dans les Learning Games immersifs.

Tool Support for Software Variability Management and Product Derivation in Software Product Lines

CHALLENGES AND APPROACHES FOR KNOWLEDGE MANAGEMENT. Jean-Louis ERMINE CEA/UTT

WordNet Website Development And Deployment Using Content Management Approach

Standards in health informatics

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Conceptual Data Model for a Central Patient Database

Technology Watch process in context: Information Systems (SI), Economic Intelligence (EI) and Knowledge Management (KM)

ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics

Context Capture in Software Development

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Umbrella: A New Component-Based Software Development Model

Dhydro: a generic environment developed to edit and access multilingual terminological data on the Internet

A Case Retrieval Method for Knowledge-Based Software Process Tailoring Using Structural Similarity

Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach

Adaptation of the ACO heuristic for sequencing learning activities

Servers and online bibliographic databases in developing countries: the African reality

SPRING SCHOOL. Empirical methods in Usage-Based Linguistics

Interdisciplinary Object-Oriented Domain Analysis. for Electronic Medical Records. Jorn Bodemann

MODELLING, AUTHORING AND PUBLISHING THE DOCUMENT ANALYSIS LEARNING OBJECT

Common Criteria For Information Technology Security Evaluation

A Quagmire of Terminology: Verification & Validation, Testing, and Evaluation*

Project VIDE Challenges of Executable Modelling of Business Applications

Ontological Representations of Software Patterns

MULTIFUNCTIONAL DICTIONARIES

Activity Mining for Discovering Software Process Models

A Chart Parsing implementation in Answer Set Programming

Exploring Electronic Medical Record (EMR) Using an Information Retrieval Perspective IEEE ICOCI 2006

SOCIS: Scene of Crime Information System - IGR Review Report

The Use of Terminological Knowledge Bases in Software Localisation

SCHOOL OF COMPUTER STUDIES RESEARCH REPORT SERIES

Ph.D. in Computer Science

Get the most value from your surveys with text analysis

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task

Special Topics in Computer Science

Electronic Health Record (EHR) Standards Survey

Diagnostic Imaging Requisition Quality When Using an Electronic Medical Record: a Before-After Study

MERGING ONTOLOGIES AND OBJECT-ORIENTED TECHNOLOGIES FOR SOFTWARE DEVELOPMENT

Improving Decision Making in Software Product Lines Product Plan Management

LIUM s Statistical Machine Translation System for IWSLT 2010

Extending Contemporary Decision Support System Designs

PS engine. Execution

Section des Unités de recherche. Evaluation report. Research unit : Troubles du comportement alimentaire de l adolescent. University Paris 11

KSE Comp. support for the writing process 2 1

UML Representation Proposal for XTT Rule Design Method

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Universal. Event. Product. Computer. 1 warehouse.

Generalizing Messages Digests

9th International Conference on Terminology and Artificial Intelligence

Brochure version 1 February 2015

How To Create A Decision Support System For A Patient Care System

interactive automatic (rules) automatic (patterns) interactive REP ENVIRONMENT KERNEL

Toward Quantitative Process Management With Exploratory Data Analysis

STATISTICAL DATABASES: THE REFERENCE ENVIRONMENT AND THREE LAYERS PROPOSED BY EUROSTAT

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

INFORMATION TECHNOLOGY PROGRAM

A Conceptual Approach to an Open Hospital Information System

Transcription:

Menelas: An Access System for Medical Records using Natural Language Pierre Zweigenbaum (Ed.), Consortium Menelas DIAM INSERM U.194 and SIM, Service d'informatique Medicale, Assistance Publique { H^opitaux de Paris 91, boulevard de l'h^opital, F-75634 Paris Cedex 13, France pz@biomath.jussieu.fr Abstract The overall goal of Menelas is to provide better access to the information contained in natural language patient discharge summaries, through the design and implementation of a pilot system able to access medical reports through natural languages. A rst, experimental version of the Menelas indexing prototype for French has been assembled. Its function is to encode free text PDSs into both an internal representation and ICD-9-CM nomenclature codes. A preliminary evaluation shows the potential for reasonable coverage and precision. The Menelas prototype will be enhanced and extended into a pilot system which will be tested in two hospital sites. Keywords: Natural Language Processing; Patient Record; Knowledge-Based Systems; Information Retrieval; ICD-9-CM; Conceptual Graphs. Correspondence: Pierre Zweigenbaum SIM / INSERM U.194, 91, boulevard de l'h^opital F-75634 Paris Cedex 13, France tel: (+33) 1 45 83 67 28 fax: (+33) 1 45 86 80 68 Submitted to Computer Methods and Programs in Biomedicine, Special Issue on AIM. The issue should also contain information on the consortium. 1

1 Introduction The dramatic expansion of health services over the past decades has created an information explosion for the health profession and a growing demand for detailed record-keeping. In response to this, paper charts are being slowly replaced by computerized medical records developed within Hospital Information Systems. The latter were developed primarily to handle numerically based information. However, anumber of applications such as ongoing patient care, evaluation of health care and clinical research require information which is predominantly available in narrative patient discharge summaries (PDSs). This narrative is a widely available, low cost source of reliable information. However, with the current technology, there is no way to directly access the information contained in a free text format: a new sort of system is thus required to analyse PDS texts using natural language processing techniques. 2 Background The automatic analysis of medical reports has been widely studied [1, 2, 3]. Whereas most early methods were essentially syntax-based, recent approaches focus on the semantic representation of medical texts [4, 5,6]. They enable a deeper level of understanding, including the representation of information that was implicit in the texts but is evident for the target readers [6]. This project builds on previous work performed by the dierent partners either in the aim Exploratory Action [7] or independently on the analysis of French, English and Dutch [8, 9, 10]. 2

3 Design considerations The overall goal of Menelas is to provide better access to the information contained in PDSs through the following services: The management sta of the medical unit may consult its own electronic activity board. The clinical sta may also access the stored PDSs to retrieve a set of patients having specic characteristics, which allows some form of case based reasoning. The same facility allows a limited test of research hypotheses, on the basis of the available patient sample. Nomenclature codes are produced automatically for each PDS. This contributes to one of the aim Tasks, \data standardisation, classication and encoding." These services will be implemented as modular subsystems, which can be realised gradually. The test medical domain is coronary diseases. The basis of the project is the design and implementation of a pilot system able to analyse the contents of medical reports and to store them in a database as a set of conceptual structures which represent their \meaning". These structures, which may be seen as the core representation of the narrative, may then be consulted to retrieve specic information contained in a PDS. An index is also produced according to the ICD-9-CM nomenclature. The twelve partners of Menelas represent three linguistic groups: French, English and Dutch. Language-dependent components are connected to the rest of the system through well-specied interfaces in order to facilitate extension to other natural languages. 3

Menelas adopts a knowledge-based approach to natural language understanding, and relies on a large body of linguistic and medical knowledge to perform its task. All semantic and domain knowledge is expressed in a common, general representation called conceptual graphs [11], which has already been used in medical systems [12, 4, 5]. 4 System description Menelas is organised around three main systems: The document indexing system analyses a PDS and stores it in a database as a set of conceptual structures. It also produces the nomenclature indexes. This process can be run in batch mode. The consultation system allows physicians and administration sta to access PDSs by their contents through a user-friendly interface. The administrator system enables the customization of the knowledge base of the system in order to set up a new application or to maintain an existing application. These systems are implemented in prolog, lisp, c++ and c, and run on Unix (currently, on SUN); the consultation system user interface runs on a Personal Computer with Windows. 5 Status report The rst phase of the project is now terminating, and a prototype version of most Menelas components is available. The indexing prototype, which constitutes the core of the Menelas prototype, has been integrated for French. 4

The indexing system follows the standard division of natural language processing systems: morpho-syntactic, semantic and pragmatic analysers; the latter is language-independent. The performance of the prototype was explored in two dimensions. Along one dimension, a subset of the functions of the indexing system for English were tested on a whole 475 English PDS corpus from the Royal Victoria Hospitals in Belfast. This evaluated the breadth of coverage that can be reached: about 60% of the sentences in the corpus could receive a full syntactic parse. A similar experiment on56french PDSs (H^opital de la Timone, Marseilles, and Pitie-Salp^etriere, Paris) showed a comparable result of 65% parsed sentences. Along the other dimension, we set up a method for evaluating globally the near-whole set of functions of the indexing system for French. The protocol is inspired from the third Message Understanding Conference [13, 14]. The principle of this evaluation consists in comparing the behaviour of the system with the one of a human reader, disposing of the same texts, and performing the basic tasks of Menelas: information retrieval and nomenclature code generation. The rst phase of the project mainly involved developing and tuning this evaluation procedure, with a restricted three PDS sample from the French PDS corpus. The procedure allowed us to monitor the progress of system performance on these PDSs while the lexicons and knowledge bases were developed. This gave an idea of the depth and precision of treatment that can be performed by the system, and of the time spent per PDS on knowledge development. Given appropriate knowledge, the system could correctly identify a set of 11 specic information items in a test PDS, as 5

well as the 6 ICD-9-CM codes that a human coder had assigned to this PDS. During the second phase of the project, the procedure will be applied to a larger number of texts. 6 Lessons learned Both sides of the evaluation have brought insights into the methodologies to apply in order to obtain better coverage and precision. On the one hand, even if we estimate that the percentage of fully parsed sentences could at best be raised to a ceiling of 80%, recovery methods that can process additional sentences will be useful. In the current design, partial parses from the French morpho-syntactic analyser can already be passed on to the semantic and pragmatic components, which can make sense of them independently. On the other hand, what needs to be enhanced now is mainly the quality and volume of the lexicons and knowledge bases. The knowledge development process has proven to be time-consuming and error-prone, and a more precise methodology is being set up. Tools will also be needed to facilitate this process. The evaluation procedure is an important asset; two complementary directions are being explored: the development of consistencychecking tools, and the semi-automated reuse of available on-line knowledge resources [15, 16, 17]. 7 Future plans The rest of the project is divided in two sections. Phase 2 will complete and enhance components, knowledge bases and integration (rst half of 1994). Emphasis will be put on the above-mentioned points. Phase 3 will evaluate 6

the resulting pilot system with users in two hospital sites (second half of 1994). Acknowledgements We would like to thank the members of the aim team of the European Commission, in particular Luciano Beolchi and Jens P. Christensen, for their help in starting and driving this project. A Information about deliverables Deliverable #9 on the rst version of the Menelas \Linguistic and Medical Knowledge Bases" is available from the author. References [1] Naomi Sager, Carol Friedman, and Margaret S. Lyman, editors. Medical Information Processing Computer Management of Narrative Data. Addison Wesley, Reading, Mass., 1987. [2] J. R. Scherrer, R. A. C^ote, and S. H. Mandil, editors. Computerised Natural Medical Language Processing for Knowledge Engineering, Amsterdam, 1989. North-Holland. [3] F. Borst, N. Sager, N.-T. Nhan, Y. Su, M. Lyman, L.-J. Tick, C. Revillard, E. Chi, and J.-R. Scherrer. Analyse automatique de comptes rendus d'hospitalisation. In P. Degoulet, J.-C. Stephan, A. Venot, and P.-J. Yvon, editors, Informatique et Gestion des Unites de Soins 7

Comptes Rendus du Colloque AIM-IF, Informatique et Sante, chapter 5, pages 246{256. Springer-Verlag, Paris, 1989. [4] R.H. Baud, A.M. Rassinoux, and J.R. Scherrer. Natural language processing and semantical representation of medical texts. Methods of Information in Medicine, 31:117{125, 1992. [5] Martin Schroder. Knowledge-based processing of medical language: A language engineering approach. In Proceedings of GWAI'92, Bonn, D., September 1992. [6] Marc Cavazza, Laurent Dore, and Pierre Zweigenbaum. Model-based natural language understanding in medicine. In K. C. Lun, P. Degoulet, T. Piemme, and O. Rienho, editors, Proc MEDINFO 92, pages 1356{ 1361, Amsterdam, 1992. North Holland. [7] Pierre Zweigenbaum, Marc Cavazza, Laurent Dore, Jacques Bouaud, and David Sedlock. Natural language processing of patient discharge summaries (NLPAD) extraction prototype. In Jaap Noothoven van Goor and Jens Pihlkjr Christensen, editors, AIM Reference Book, pages 277{286. IOS Press, Amsterdam, 1992. [8] A. Berard-Dugourd, J. Fargues, M.-C. Landau, and J.-P. Rogala. Un systeme d'analyse de texte et de question/reponse base sur les graphes conceptuels. In P. Degoulet, J.-C. Stephan, A. Venot, and P.-J. Yvon, editors, Informatique et Gestion des Unites de Soins Comptes Rendus du Colloque AIM-IF, Informatique et Sante, chapter 5, pages 223{233. Springer-Verlag, Paris, 1989. [9] C. Grover, J. Carroll, and T. Briscoe. The Alvey Natural Language Tools Grammar. University of Cambridge, Computer Laboratory, 8

Cambridge, 1992. 4th Release. [10] Peter Spyns and Geert Adriaens. Applying and improving the restriction grammar approach for Dutch patient discharge summaries. In Antonio Zampolli, editor, Proceedings of the 14 th COLING, pages 1264{ 1268, Nantes, France, July 23{28 1992. [11] John F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, London, 1984. [12] K.E. Campbell and M.A. Musen. Representation of clinical data using SNOMED III and conceptual graphs. In SCAMC [18], pages 354{358. [13] Nigel Strang. An evaluation methodology for Menelas. Rapport de DEA, Informatique Medicale, Universite Paris VI, 1993. [14] Wendy Lehnert and Beth Sundheim. A performance evaluation of text-analysis technologies. Articial Intelligence Magazine, (Fall):81{ 94, 1991. [15] B L Humphrey and DABLindberg. Building the Unied Medical Language System. In Proc of SCAMC'89, pages 475{480. IEEE, 1989. [16] Francoise Volot, Pierre Zweigenbaum, Bruno Bachimont, Mohamed Ben Sad, Jacques Bouaud, Marius Fieschi, and Jean-Francois Boisvieux. Structuration and acquisition of medical knowledge: Using UMLS in the Conceptual Graph formalism. InProc of SCAMC'93. Mc Graw Hill, 1993. [17] George A. Miller. Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 1990. [18] Proc of SCAMC'92. Mc Graw Hill, 1992. 9