1 Management of and Access to Virtual Electronic Health Records M. Springmann 1, L. Bischofs 2, P. Fischer 3, H.-J. Schek 4, H. Schuldt 1, U. Steffens 2, R. Vogl 5 1 Database and Information Systems Group, University of Basel, Switzerland 2 OFFIS Oldenburg, Germany 3 Institute of Information Systems, ETH Zurich, Switzerland 4 University of Konstanz, Germany 5 HITT - health information technologies tirol, Innsbruck, Austria Contact author: Abstract Digital Libraries (DLs) in ehealth are composed of electronic artefacts that are generated and owned by different healthcare providers. A major characteristic of ehealth DLs is that information is under the control of the organization where data has been produced. The electronic health record of patients therefore consists of a set of distributed artefacts and cannot be materialized for organizational reasons. Rather, the electronic patient record is a virtual entity. The virtual integration of an electronic patient record is done by encompassing services provided by specialized application systems into processes. This paper reports, from an application point of view, on national and European attempts to standardize electronic health records. From a technical perspective, the paper addresses how services can be made available in a distributed way, how distributed P2P infrastructures can be evaluated, and how novel content-based access can be provided for multimedia electronic health records. Categories and Subject Descriptors H.3.7 Digital Libraries : Systems issues; H.3.3 Information Search and Retrieval : Search process; H.2.8 Database Applications : Image Databases; H.3.4 Systems and Software : Distributed Systems; J.3 [Life and Medical Sciences] : Medical information systems General Terms Design, Experimentation, Reliability, Standardization Keywords ehealth, electronic health records, distributed P2P infrastructures, management and access 1 Introduction ehealth Digital Libraries contain electronic artefacts that are generated by different healthcare providers. An important observation is that this information is not stored at one central instance but under the control of the organization where data has been produced. The electronic health record of patients therefore consists of a set of distributed artefacts and cannot be materialized for organizational reasons. Rather, the electronic patient record is a virtual entity and has to be generated by composing the required artefacts each time it is accessed. The virtual integration of an electronic patient record is done by encompassing services provided by specialized application systems into processes. A process to access a virtual electronic health record encompasses all the services needed to locate the different
2 artefacts, to make data from the different healthcare providers available, to perform the format conversations needed, and to present the result to a user. This requires an infrastructure that is highly dependable and reliable. Physicians must be given the guarantee that the system is always available. Moreover, the infrastructure has to allow for the transparent access to distributed data, to provide a high degree of scalability, and to efficiently schedule the access to computationally intensive services by applying sophisticated load balancing strategies. Physicians need information immediately in order to make vital decisions. Long response times due to a high system load cannot be tolerated. An example is content-based similarity search across a potentially large set of documents which requires the availability of several building blocks, some of them being highly computationally intensive, which have to be combined to processes. This paper gives an overview on various aspects of management and access to virtual electronic health records and research that have been carried out in this field within the last three years. The paper is organized as follows: Chapter 2 describes a novel ehealth IT infrastructure which allows the access to health records across collaborating institutions. Chapter 3 explains how complex processes spanning over several institutions can be modelled and executed and thus, perform all required actions to convert, aggregate or anonymize data of the virtual electronic health records. Chapter 4 shows how similarity search technology can be used to enhance retrieval of electronic health records. Since security and permissions are essential for such sensitive data as health records, Chapter 5 gives an outlook, on how the behaviour of such distributed peer-to-peer can be simulated. Chapter 6 summarizes and gives a conclusion and outlook. 2 Establishing electronic patient records (EPR) in hospitals has been the central issue for hospital IT management in the past 10 years. In the industrialized countries, a very good coverage with hospital IT systems for documentation of patient related data and their integration to form a common EPR has now been reached (for the European situation, see the HINE study 2004, as presented in the proceedings of the EU high level conference ehealth2006: Rising to the challenges of ehealth across Europe s regions). Currently, the main issue for the European countries, as called for by the European ehealth action plan of 2004, is the establishment of national Electronic Health Record systems that allow for the standardized exchange of electronic health data between healthcare providers with the vision of European interoperability. In view of this, the Austrian Ministry of Health (MoH) has initiated the Austrian ehealth Initiative in early a stakeholders group which delivered the Austrian ehealth strategy to the Austrian Ministry of Health at the end of HITT and its project partners from the project [Schabetsberger et alii 2006] did very actively and influentially contribute to the formulation of the strategy and to subsequent consultations with the MoH in the course of 2006, ensuring the elaborate digital library ideas for distributed archive systems and high level security mechanisms were maintained in that strategy and implemented in a subsequent feasibility study commissioned by the MoH. The team could finalize in November 2006 a reference implementation [Vogl et alii 2006] of its distributed secure architecture (see figure 1) for shared electronic health records with special focus on secure web services with service and user role based access control (RBAC) relying on the SECTET toolkit developed by the project partners at the Institute for Software Engineering at Innsbruck University. Pilot projects utilizing this architecture for topical electronic health records applications (e. g., online access to health record data between the hospital trusts in Vienna
3 (KAV) and Tirol (TILAK); a haemophilia register for quality insurance and epidemiological studies; an autistic children therapy register; a portal for report access by a group of general practitioners) have been commissioned and are currently in implementation. A certification of the compliance with IHE (integrating the Healthcare Enterprise) profiles is prepared for the Connecathon event of IHE Europe in April 2007 in Berlin. Several contributions at international conferences and to scientific journals have been made in 2006 helping in the dissemination of digital library concepts in the healthcare setting. Figure 1 The architecture and its utlization for access to multimedia documents across healthcare organizations. The functional components are realized as web services (AN: access node; DMDI: distributed meta data index; DR: document repository; PL: patient lookup; PLI: patient lookup index; GI: global index; PEP: policy enforcement point). Adoption of the architecture for compliance with the XDS (cross enterprise document sharing) integration profiles of the influential IHE industry initiative are under way. 3 XL System XL is a system which provides an environment for definition, composition, and deployment of Web services on a set of different platforms. XL Web services are expressed by the XL language. XL is specifically designed for processing XML data as the most common data representation within the Web service domain. The following bullet points characterize XL: The XL language uses the XQuery/XSLT 2.0 types throughout the whole system. XL uses XQuery expressions (including XML Updates) for processing XML. XML message passing in XL is based on the XML SOAP (Web Services) standard. XL includes a persistent XML store. A full implementation of XL has been available since There are a number of demo applications: e.g., an auction system, a book store, and a sophisticated SmartHome demo.
4 XL is operational on all Java 1.4 (and higher platforms). There is a special, limited version of XL that runs on Java 1.3 and PDAs. There is also comprehensive tool support: There is an Eclipse Plug-in for XL which includes syntax-driven editing, debugging, and simple deployment. 3.1 XL in the context of Virtual Electronic Health Records Virtual Electronic health records need to deal with distributed artefacts of medical data (from different healthcare providers, health insurances, public authorities and the patients themselves) that cannot be materialized due to organizational reasons or privacy concerns. Therefore, the data needed for a certain user or user group needs to be integrate on the fly from the individual, specific data providers. More specifically, a system providing virtual electronic health records needs to provide the following functional requirements: locate the different artefacts make data from the different healthcare providers available perform the format conversions needed aggregate/anonymize the data present the result to a user XL can contribute solutions to those requirements, if the system is realized in the context of web services. The strengths of XL are in the areas of high-level format conversions, aggregation/ anonymization and presentation of results. Locating services that provide relevant artifacts is possible using either standard web service techniques (UDDI) or additional services that provide a more contentoriented approach to services. XL might also be used to make data from the individual providers available as web services, but needs to be complemented by additional software for low-level format conversion and feature extraction (more efficient implementations possible in lower-level languages) In detail, XL can provide those services: Filtering, Transforming and Integrating medical records from various sources that are available as web service/xml o Full support for all WS standards o Full XQuery support o Possible to call code for operations outside the XL (e.g. image search) Presenting results as web pages on different devices o Present results of a service as HTML, XHTML o Built into the system, easy to activate Orchestrating and composing services (e.g. running workflows defined by BPEL) Quick and easy development of new services based on open standards Deployable on any Platform supporting at least Java 1.3, including o application servers and gateways to existing systems o personal computers of physicians, scientific personnel as well as patients o embedded devices, PDAs and eventually mobile phones As a demonstrator for the usage in T1.6, the rapid development of web service to integrate some medical data will be shown. The service demonstrated here collects information about the patient age for cases of a given illness from two health providers which make that data available as web services. The data transfer format and the call parameters are different, however. The service retrieves all the necessary information from all the services and transforms it into a common result document that only contains the age information for each cases (presumably for reasons of anonymity)
5 To keep the demonstrator short and concise, no advanced features of web service interaction and data manipulation are shown here. They can be added to the example easily when required. The development is done using the XL platform and the development environment XLipse, which provides XL development, testing and debugging inside Eclipse. Figure 2 Screen-shot of the XL system To start developing, a new XL Project needs to be created, following the usual Eclipse approach. To this project, a file with the name find_illness.xl is added. The relevant service and operation signature are added (line 1 and 3), supported by content assist to avoid mistakes. Inside the operation, the variable for the case list is initialized as an XML fragment. The first web service (a hospital) is called; this one supports a parameter to specify the illness. The result is stored in the variable $hospital_1. The age information for each of the hospitalizations in that variable are now added to the case list. The second provider only allows querying all cases after a given date, so all cases after a date in the past a retrieved into $health_provider_2. Since this provider also does not have a strict schema, we retrieve all reports off illnesses (regardless of their position in the document) and keep those that have a diagnosis corresponding to the requested illness. The age information for those cases is then added to the case list. Finally, the case list is returned. 4 Interactive Similarity Search for Electronic Health Records Electronic health records have become a very important patient-centred entity in health care management. They represent complex documents that comprise a wide diversity of patient-related information, like basic administrative data (patient's name, address, date of birth, name of physicians, hospitals, etc.), billing details (name of insurance, billing codes, etc.), and a variety of medical information (symptoms, diagnoses, treatment, medication, X-ray images, etc.). Most of these details are structurally and temporally interrelated, like a (1) particular treatment and (2) medication ordered from a (3) physician on an (4) established diagnosis from patient's (5) reported symptoms at a (6) certain date. Each item is of a dedicated media type, like structured alphanumeric data (e.g. patient name, address, billing information, laboratory values, documentation codes, etc.), semi-structured text (symptoms description, physician's notes, etc.),
6 images (X-ray, CT, MRT, images for documentation in dermatology, etc.), video (endoscopy, sonography, etc.), time series (cardiograms, EEG, respiration, pulse rate, blood pressure, etc.), and possibly others, like DNA sequence data. State-of-the-art medical information systems aggregate massive amounts of data in patient records. Its exploitation, however, is mainly restricted to accounting and billing purposes, leaving aside most medical information. In particular, similarity search based on complex document matching is out of scope for today's medical information systems. We believe that efficient and effective matching of patient records forms the rewarding foundation for a large application variety. Data mining applications that rest upon our patient record matching approach will foster clinical research and enhanced therapy, in general. Specifically, our proposal allows for effective comparison of similar cases to (1) shorten necessary treatments, (2) improve diagnoses, and (3) ensure the proper medication to improve patients' well-being and reduce cost. This can be achieved, by no longer resting entirely on exact matching, but employing the paradigm of similarity search to the content of electronic health records. In particular, health records at present contain a vast number of medical images, stored in electronic form, that captures information about the patients health and progress of treatments. This information is only used indirectly for retrieval in the form of the metadata stored in the DICOM header in PACS system or the diagnosis of a radiologist. State-of-the-art image retrieval technology can assist the retrieval by using the medical images for search in addition to traditional search approaches. One example of such a retrieval system is ISIS. ISIS stands for Interactive SImilarity Search and is a prototype application for information retrieval in multimedia collections built at ETH Zürich [Mlivoncic, Schuler and Türker 2004] and has been extended at UMIT and the University of Basel. It supports content-based retrieval of images, audio and video content, and the combination of any of these media types with sophisticated text retrieval [Springmann 2006]. Figure 3: ISIS Demonstrator A set with more than 50,000 medical images, which sum up to roughly 4.65 gigabytes of data, has been inserted and indexed with the ISIS system. The files originate from four different collections. All of them have been used for the Medical Image Retrieval Challenge for the Cross Language Evaluation Forum 2005 (ImageCLEFmed 2005, [Clough 2006]). The results of our experiments as well as the
7 overall experience described for ImageCLEFmed is that the combined retrieval using visual and textual information improves the retrieval quality compared to queries using only one source of information, either visual or textual. A variation of this setting is the automatic classification of medical images. This approach is also addressed as a task in ImageCLEF and one goal in this approach in clinical practice is, to be able check the plausibility e.g. of DICOM header information and other kind of automatically or manually entered metadata in order to reduce errors. The recent activities performed at UNIBAS in this area focused on the improvement of the retrieval quality and also the retrieval time for such classification tasks using the ImageCLEF benchmark. 5 easim ehealth Digital Libraries connect a number of different collaborating institutions which store different parts of virtual electronic health records. Which part of a record is stored by which institution is determined by the institutions role concerning their patients medical care. From a digital library point of view this calls for a flexible support by a system architecture which enables the combination of distributed artefacts against the background of different organizational and topical contexts, offering simple and fast access to physicians in charge on the one hand and on the other hand guaranteeing autonomy and, in particular, confidentiality to institutions producing and owning specific health record content. [Bischofs and Steffens 2004] introduces a new approach for a class of peer-to-peer systems based on organisation-oriented super-peer architectures, which are able to map organisational structures to peer-to-peer-systems and which are therefore especially suitable for the problem at hand. However, the selection of adequate search methods for such new, not yet existing systems is difficult. One approach to come up with adequate metrics for search behaviour in such systems is the preliminary simulation of search methods. For this purpose, the easim simulator has been developed [Bischofs et alii 2006]. easim is a time discrete, event-based peer-to-peer simulation tool based on the generic simulation framework DESMO-J. Unlike other peer-to-peer simulators, which primarily focus on concrete systems and do not address the underlying architectural styles, easim follows a layered approach considering three different layers. At the lowest layer, the physical layer, the physical hosts are connected. The middle layer is called virtual layer and describes the logical interconnection of the (super-)peers. In the upper layer the relationships among organisational units can be investigated. All adjacent layers are interconnected, such that each element of a higher layer is hosted on an element of the layer underneath. Each element is divided into a stateless part for modelling the behaviour and a stateful part for modelling element specific information like neighbourhood, cache and resources. The independent modelling of state and behaviour of a peer enable the easy replacement of search and routing behaviour in different simulation runs. For ease of use, element behaviour can be assigned to a set of elements. Each of the three layers can be simulated separately, However, it is also possible to simulate two or three layers in combination. easim s user interface consists of a number of wizards assisting the user throughout the whole simulation process. Hence, users can easily configure the tool at different levels of abstraction. Predefined metrics can be activated and visualized directly at simulation time. For displaying simulation results, easim makes use of the JFreeChart library. Results are shown immediately at runtime for a number of selected metrics. Figure 2 contains an example of a simulation result displaying six different metrics concerning a search in a peer-to-peer network.
8 Figure 4: easim simulator Future work for easim is concerned with both, conceptual and technical aspects. On the technical level, we are currently developing a new integrated modelling and simulation environment for peer-to-peer systems based on Eclipse in order to support our framework with additional tools. The most distinctive feature will be a modelling component providing means for specifying architectural styles with a state-chart-like notation. The aim is to simplify the creation of new architectures and to support fast modifications to existing ones. It will be possible to define the behaviour and the properties of the simulated nodes in a scenario. This includes the possibility of dynamically changing the network structure during a simulation run, which is not yet feasible with easim s current implementation. Future work on evaluation of peer-to-peer architectural styles will emphasize the simulation of architectural changes at runtime, based on our existing experience with different simulation systems. The characteristics for various architectural styles will be further elaborated and investigated. 6 Conclusions and Outlook Recent developments, for instance in medical imaging, have led to significant improvements in data acquisition large increases in quantity (but also in quality) of medical data. Similarly, in health monitoring, the proliferation of sensor technology has strongly facilitated the continuous generation of vast amounts of physiological data of patients. In addition to this information, more traditional structured data on patients and treatments needs to be managed. In most cases, data is owned by the healthcare organization where it has been created and thus needs to be stored under the organization s control. All this information is part of the electronic health records of a patient, thus is included in ehealth Digital Libraries. A major characteristic of ehealth DLs is that information is under the control of the organization where data has been produced. The electronic health record
9 of patients therefore consists of a set of distributed artefacts and cannot be materialized for organizational reasons. Rather, the electronic patient record is a virtual entity. The virtual integration of an electronic patient record is done by encompassing services provided by specialized application systems into processes. In this paper, we have addressed several major issues in the management of and access to virtual electronic health records. First, this includes platforms for making data and services available in a distributed environment while taking security and privacy into account. Second, it addresses support for efficient and effective search in these virtual electronic health records. The concrete activities that are reported in this paper have been subject to joint work within the task Management of and Access to Virtual Electronic Health Records of the DELOS network of excellence. References Bischofs L., Steffens U Organisation-oriented super-peer networks for digital libraries. In: Peer-to- Peer, Grid, and Service-Orientation. In M. Agosti, H.-J. Schek and C. Türker (eds.), Digital Library Architectures: Peer-to-Peer, Grid, and Service-Orientation. Proceedings of the 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. S. Margherita di Pula (Italy), June Lecture Notes in Computer Science Vol Berlin-Heidelberg: Springer : Bischofs L., Giesecke S., Gottschalk M., Hasselbring W., Warns T., Willer S Comparative Evaluation of Dependability Characteristics for Peer-to-Peer Architectural Styles by Simulation. In Journal of Systems and Software 79 (10) : Clough P., Müller H., Deselaers T., Grubinger M., Lehmann T., Jensen J., Hersh W The CLEF 2005 Cross-Language Image Retrieval Track, In Accessing Multilingual Information Repositories. Revised Selected Papers from the 5th Workshop of the Cross-Language Evaluation Forum (CLEF 2005). Vienna (Austria), September Lecture Notes in Computer Science Vol Berlin-Heidelberg: Springer. Florescu D., Grünhagen A., Kossmann D XL: a platform for Web Services. In Proceedings of First Biennal Conference on Innovative Data Systems Research (CIDR 2003). Asimolar (Canada), 5-8 January Mlivoncic M., Schuler C., Türker C Hyperdatabase Infrastructure for Management and Search of Multimedia Collections. In M. Agosti, H.-J. Schek and C. Türker (eds.), Digital Library Architectures: Peerto-Peer, Grid, and Service-Orientation. Proceedings of the 6th Thematic Workshop of the EU Network of Excellence DELOS. Revised Selected Papers. S. Margherita di Pula (Italy), June Lecture Notes in Computer Science Vol Berlin-Heidelberg: Springer. Schabetsberger T., Ammenwerth E., Breu R., Hoerbst A., Goebel G., Penz R., Schindelwig K., Toth H., Vogl R., Wozak F E-Health Approach to Link-up the Actors in the Health Care System of Austria. In A. Hasman, R. Haux, J. van der Lei, E. De Clercq,F.H.R. France (eds.), Ubiquity: Technologies for Better Health in Aging. Societies Proceedings of MIE2006. Amsterdam (Netherlands), August Amsterdam : IOS Press. Studies in Health Technology and Informatics. Vol. 124 Springmann M A Novel Approach for Compound Document Matching. In B. Wilson, J. Borbinha (eds.), Bulletin of the IEEE Technical Committee on Digital Libraries (TCDL), Vol. 2 (2), Vogl R., Wozak F., Breu M., Penz R., Schabetsberger T., Wurz M Architecture for a Distributed National Electronic Health Record System in Austria. In Proceedings of EuroPACS 2006, Trondheim (Norway), June 2006.