Structuring Unstructured Clinical Narratives in OpenMRS with Medical Concept Extraction
|
|
|
- Bernard Foster
- 10 years ago
- Views:
Transcription
1 Structuring Unstructured Clinical Narratives in OpenMRS with Medical Concept Extraction Ryan M Eshleman, Hui Yang, and Barry Levine [email protected], [email protected], [email protected] Department of Computer Science San Francisco State University 1600 Holloway Avenue, San Francisco, CA, USA Abstract We have developed a new software module for the open source Electronic Medical Record System OpenMRS to analyze unstructured clinical narratives. This module leverages Named Entity Recognition (NER) to deliver concise, semantic-type driven, interactive summaries of clinical notes. To this end, we performed an extensive empirical evaluation of four Named Entity Recognition (NER) systems using textual clinical narratives and full biomedical journal articles. The four NER systems under evaluation are MetaMap, ctakes, BANNER. We studied several ensemble approaches built upon the above four NER systems to exploit their collaborative strengths. Evaluations are performed on the manually annotated patient discharge summaries from the Informatics for Integrating Biology and the Bedside group (I2B2) and the CRAFT dataset. The main results include (1) BANNER significantly outperforms the other three systems on the I2B2 dataset with F1 values in the range of , in contrast to of other systems; and (2) Surprisingly, an ensemble approach of BANNER with any combinations of the other three approaches tends to degrade the performance by in F1 when evaluated on the I2B2 dataset. Based on the evaluation results, we have developed a BANNER-based NER module for OpenMRS to recognize semantic concepts including problems, tests, and treatments. This module works with OpenMRS versions 1.9.x and 2.x. The user interface presents the concise clinical notes summaries and allows the user to filter, search and view the context of the concepts. We have also developed a companion web application to train the BANNER model based on data from the OpenMRS database. Finally, the module and source code are available at Module Index Terms OpenMRS, Named Entity Recognition (NER), Clinical Narratives I.! INTRODUCTION In today s fast pace clinical setting, to achieve satisfying patient outcomes, healthcare providers must equip themselves to efficiently and effectively access and process patient s records. While the advent of the Electronic Medical Record (EMR) has dramatically improved the provider s access to patient information, the EMR often imposes constraints on the form of the patient data and does not provide strong support for unstructured data such as text from clinical notes. One study found that unstructured data accounts for 80 percent of the patient data contained in a medical organization [1]. This asymmetry between the proportion of data for which the EMR is designed (encoded data) and the format of actual patient data can be problematic. At best, this can lead to an increase in the time required to review a patient s records. At worst, it can lead to the provider overlooking important data due to either time constraints or information overload [2]. Moreover, a study conducted in [3] identified patient notes as a primary source of information when reviewing patient history. Because of this, any improvement in the ability to process and present unstructured information in an EMR has the potential to improve patient care. Named Entity Recognition (NER) is the task of identifying elements within unstructured text that fall within pre-defined conceptual categories. It is an active area of research in the Natural Language Processing (NLP) community and has produced many techniques and tools that may prove valuable in addressing the problems with unstructured textual data. Many of these techniques have been documented and published in the research literature and open source implementations have been distributed for open use within the community [29][30][31][32]. These tools, however, tend to be tuned for performance in a specific application domain. Because of the distinctions in style, language usage and vocabulary across domains (e.g., biomedical journal articles vs. clinical notes), one tool s performance may not transfer well into another domain. Therefore, to use one of these tools in a real world application, one must first identify among many existing tools a proper one or a combination of multiple tools. Such a task is neither trivial nor avoidable. Additionally, while the community has developed tools that show strong performance on a number of NLP tasks, the impact of these achievements tends to remain within the confines of the research community and it remains a challenge to translate the progress into real world use cases and improve the overall capabilities of EMRs in use today. OpenMRS [4] is a free, enterprise scale, open source EMR implemented worldwide with the intention of improving health care delivery to under-resourced areas of the world. With the goal of addressing some of the problems with unstructured data, we have developed a new module to extend the functionality of OpenMRS to utilize recent progress in NER research. This module uses an NER system to extract concepts from textual patient notes in the following semantic categories: Problem, Test, and Treatment. Table 2 in the methods section elaborates on these entity classes. The extracted concepts provide snapshot summaries of patient notes that promote fast synthesis of patient history as well as the ability to easily locate information of interest via an interactive interface. The remainder of this paper is structured as follows: The Related Work section describes five different biomedical NER tools and their underlying algorithms. It also describes several Open Source EMRs including OpenMRS. The Methods section elaborates the NER system evaluation process. The
2 Results section presents the main evaluation results. Next, the Implementation section provides an overview of OpenMRS and the two components of our system: the EMR Module and Standalone Training Application. The final section concludes this article and identifies future directions. II.! BACKGROUND AND RELATED WORK A. Applications of NLP in Point of Care Tools Demner-Fushman et al. recently reviewed many potential applications of NLP in Clinical Decision Support and other EMR functions [24]. These include semi-automatic coding of admit diagnoses and ICD-9 codes [25] as well as automatically extracting interpretations from lung scan reports and monitoring diagnostic performance by radiologists [26]. These applications, however, are still being researched and implementations of the research in practice have not been widely reported. Hirsch et al. describes a longitudinal, problem oriented, patient history summarization tool HARVEST that operates on top of existing EMR systems [23]. HARVEST aggregates data from clinical notes as well as patient visits (for example, ICD-9 billing codes) into the patient summaries. The application provides an interactive visualization of problems identified in the patient history and textual patient notes. Our work resembles HARVEST in many ways. One main difference is that our software system is open source and focuses on structuring clinical notes. We also expand the set of entity classes of interest to include tests and treatments. B. Named Entity Recognition Systems We conducted a survey of the literature of current NER systems to identify those most suited to our task. In choosing systems for further evaluation, we considered the reported results and their target text domain. Table 1 presents an overview of our survey. Table 1: Five NER systems: their target application domains, and highest precision/recall reported in literature. System Max Prec. Max Target Domain Recall ctakes Clinical Narratives MetaMap Biomedical literature ABNER Biomedical literature BANNER Biomedical literature MGrep Variable Apache ctakes [5] implements an analytical pipeline ending with noun-phrase chunking and a dictionary lookup. Its original target corpus comes from the Mayo Clinic EMR system. It is open source, implemented in Java, and supported by the Unstructured Information Management Architecture (UIMA) [28]. MetaMap [6] was developed by the National Library of Medicine and employs a similar pipeline to that of ctakes, which leads up to a dictionary lookup for terms in the Unified Medical Language System (UMLS) Metathesaurus. MetaMap provides a web HTTP API in addition to the source code. It is implemented in the C language and Prolog. BANNER is built on the Conditional Random Fields (CRF) implementation included in the MALLET machine learning toolkit [8] [27]. BANNER is not distributed with a trained model. This requires the user to provide a training corpus and train the model him or herself. BANNER is implemented in Java and its source code is freely distributed. MGrep [12] is distributed as a binary executable from the University of Michigan that performs dictionary to text mappings. For our evaluations, we provided it with the UMLS dictionary. Details about its underlying mapping mechanism have not been published at this time. C. Open Source EMRs There are many free open source EMRs available that provide various services and functionality. VistA [19] was developed by US Department of Veterans Affairs to support its network of VA Hospitals throughout the US. OpenEMR [21] is a widely adopted open source EMR with over 4,000 downloads per month. OpenMRS was designed with modularity and customization in mind. It was built as a medical informatics core platform on which to develop additional functionality, in the form of modules. An OpenMRS installation generally comes with a core set of modules to provide basic features, the installation can then be extended with a number of modules from a growing and community driven module library. OpenMRS is implemented around the world, including Kenya, South Africa, Rwanda, India, China, Haiti, Pakistan and the Philippines [33]. The AMPATH [28] implementation in Kenya serves over 50,000 HIV patients. A. Evaluation Corpora III.! METHODS The four NER systems under evaluation were tested on two biomedical datasets. The first dataset is provided by the Integrating Biology by the Bedside group (I2B2) [13], consisting of 425 de-identified patient discharge summaries. The summaries are manually annotated for mentions of entities in the classes of Problem, Treatment and Test. Table 2 describes the meanings of these entity classes. Full entity class definitions can be found in [37]. Table 2: Description of entity classes Type Problem Description Phrases that contain observations made by patients or clinicians about the patient s body or mind that are thought to be abnormal or caused by a disease. Treatment Phrases that describe procedures, interventions, and substances given to a patient in an effort to resolve a medical problem Test Phrases that describe procedures, panels, and measures that are done to a patient or a body fluid or sample in order to discover, rule out, or find more information about a medical problem. The second dataset is the Colorado Richly Annotated Full Text Corpus (CRAFT) [14]. CRAFT consists of 67 full text biomedical journal articles annotated with mentions of concepts from the following seven different biomedical ontologies: Chemical Entities of Biological Interest, Cell Ontology, Entrez Gene, Gene Ontology, NCBI Taxonomy, Protein Ontology, and Sequence Ontology. See Table 3 for more details of these two data sets. Table 3: Details on the corpora used in the evaluations Corpu # Docs # Sentences # Concepts Concept Types s I2B2 425 ~43,000 ~47,000 Problems, Treatment, Test CRAFT 67 ~21,000 ~100,000 7 biomedical ontologies
3 B. Evaluation Criteria and Metrics We developed an evaluation platform to integrate the heterogeneous formats of the output generated by the four systems. We tested against three span-based annotationmatching criteria: Exact Match, Single Boundary, and Any Overlap. Given a gold standard annotation of, medial 1 cm mass above her knee. An annotation mass above her knee would match under the Single Boundary and Any Overlap criteria, whereas mass would only match under any Any Overlap. In addition to span-based matching, we also evaluated concept class correctness. If the text blood pressure was identified as a problem in the gold standard annotation, but the NER system identified it as a test, then it does not satisfy the concept class correctness criterion. The standard performance metrics of Precision (True Positive / True Positive + False Positive), Recall (True Positive / True Positive + False Negative) and F1 (Harmonic Mean of Precision and Recall) were recorded based on the above-mentioned matching criteria. To ensure fair comparison, testing on the I2B2 corpus for all the NER systems was carried out over 395 of the 425 documents, where the remaining 30 documents were used to train the BANNER CRF model. Details on the BANNER learning curve are provided in the Results section. Performance on individual entity classes was also recorded. Table 4: Six ensemble formations of the four systems Ensemble Name BN-MM-ev BN_MM_MG_CT MM_MG_CT_Unan BN-sub-1 Description BANNER and MetaMap equal voting. All labels from each system included in final result. Equal voting by all systems. All labels by all systems included Only labels unanimously agreed upon by MetaMap, MGrep, and CTakes are uncluded. BANNER subordinate voting. All BANNER labels included. labels unanimously aggreed upon by MetaMap and MGrep also included. BN-sub-2 All BANNER annotations included. Additional annotations included by majority vote between MetaMap, ctakes, and MGrep BN-Sub-3 All BANNER Annotations included, only annotations agreed upon unanimously by MetaMap, MGrep, and CTakes also included. Based on the results from the above evaluations, the systems were combined in various formations to construct ensemble systems. These ensembles fell into 4 categories. (1) Equal Voting, where all annotations by all members in the ensemble are included in the final result. (2) Two out of Three voting, where an annotation is included if two of the three Figure 1: Performance Breakdown by entity type. members identified it. (3) Three out of Four voting, similar to category 2, but with 4 members and an inclusion threshold of 3 votes. (4) Subordinate Voting, where one master system has all of its annotations included, and the subordinate systems vote on any additional annotations, either unanimously, or by majority. Because BANNER is not distributed with a trained model and must be trained per use case, we have included the learning curve that shows F1 score plotted against size of the training corpus. IV.! EVALUATION RESULTS A. Performance of Individual System on I2B2 We evaluated the four systems across the six different categories on the I2B2 dataset. Due to space constraints, the full set of results can be found at Two exemplary results are presented below in tables 5 and 6. The largest value in each column is shown in bold. It is clear from the below results that BANNER shows the best performance. Table 5. No Type Matching, Single Boundary Precision Recall F1 ctakes MetaMap BANNER MGrep Table 6. Type Matching, Single Boundary Precision Recall F1 ctakes MetaMap BANNER MGrep B. Performance by Entity Type on I2B2 The annotations in the I2B2 dataset were broken down into concept type groups. For each of these groups, we evaluated Figure 2: Evaluation results for MetaMap, MGrep, and BANNER (2 different models) on the CRAFT dataset.
4 the distribution of annotations provided by the NER systems. The pie chart labeled MetaMap Tests Exact Match shows the distribution of labels MetaMap identified for the annotations labeled Test in the gold standard. 18% of the gold standard annotations were correctly identified, 4% were incorrectly labeled as Treatments, and 78% were not identified at all, which makes them False Negatives. Figure 3 shows these results. C. Performance of Ensemble Methods on I2B2 The large number of combinations of voting schemes and NER systems preclude us from providing all of the results, so we present some of the highlights of these evaluations. Table 4 above details the ensemble makeups and Tables 7 and 8 show highlights from their performance evaluations. A full list of the results can be found in the web supplement at Table 7. No Type Matching, Single Boundary Ensemble Type Precision Recall F1 BN-MM-ev BN_MM_MG_CT MM_MG_CT_Unan BN-Sub BN-Sub BN-Sub Table 8. Type Matching, Single Boundary Ensemble Type Precision Recall F1 BN-MM-ev BN_MM_MG_CT MM_MG_CT_Unan BN-Sub BN-Sub BN-Sub From the above results, we can make the following observations. Intuitively, the ensemble that required MetaMap, MGrep, and ctakes to agree unanimously showed the highest precision in four of the six categories. This is likely because it has one of the most restrictive inclusion thresholds. Similarly, the ensemble with the least restrictive inclusion threshold, that which includes all labels from all systems, shows the highest recall values across the board. In all the categories, the highest F1 score was achieved by the ensemble that included the unanimous voting results of MetaMap, MGrep, and ctakes along with all labels provided by BANNER. When comparing the ensemble performance against that of BANNER alone, we find that BANNER alone scores a higher F1 measure in 4 of 6 categories, the two categories it does not perform best are the two with the least restrictive boundary matching criterion. Because of BANNER s consistantly high performance alone compared to the ensembles, we chose to proceed with the module development using BANNER as a solo system. the I2B2 corpus. Figure 2 shows the results of these evaluations. The results show that MetaMap has the best overall F1 score in all categories and consistently strong performance relative to the other systems in all categories. BANNER trained on nine CRAFT documents shows the best precision when type matching is not required. When type matching is a requirement, both BANNER models show very poor performance. We plotted the learning curve of BANNER s performance on I2B2 data to observe the effect of training set size on performance. Figure 3 shows these results. As expected, BANNER s performance on the test set improves as the training set increases and seems to be converging with the training set at some asymptote. This figure gives us confidence that we would be able to improve BANNER s Figure 3: BANNER learning curve: precision, recall and F1 scores on the training set and the test set as the size of the training set increases. performance on the CRAFT dataset if we had the resources to train it on a larger set. V.! IMPLEMENTATION In this section, we describe the implementation of notesprocessing module in OpenMRS. This module consists of two main components: (1) an NER pipeline for the processing of clinical notes, as well as an intuitive user interface to visualize and navigate the results of the NER pipeline; and (2) a web application for retraining the underlying BANNER model on Figure 4: Screenshot from the notes-processing module after the user clicks on the word Demerol in the word cloud. A clinical note is rendered with all concepts colored by type. D. CRAFT Data Set Evaluations We evaulated MetaMap, MGrep and BANNER on the CRAFT dataset using the same matching criteria as with the I2B2 set. Because the BANNER model needs to be trained per use case, we trained a model using the CRAFT corpus, however the demanding training process of BANNER coupled with the density and variety of annotations in the CRAFT set limited the number of documents we could use for training. We reached the limits of our resources at training on 9 CRAFT documents. We evaluated BANNER with this 9- document model, as well as the model previously trained on
5 new texts. A. OpenMRS Clinical Notes-processing Module The OpenMRS core system is designed to support a modular architecture. The core system provides a service layer that serves as the gateway to the underlying data model. All functionality is composed of modules that are built on top of and extend this service layer. The Concept Dictionary is the fundamental building block of every OpenMRS implementation [15]. It contains names, codes and attributes for every data point or observation made in the system. Concepts include medical tests, drugs, results, symptoms, etc. The base OpenMRS implementation generally ships with a default Concept Dictionary that provides mappings to international standards such as ICD-10 [16], SNOMED CT [17], and RxNORM [18]. These concepts can be further grouped into concept classes, which we leverage in our module. An exemplar concept class is Diagnosis which contains concepts such as asthma, gallstones, and rickets. By following this modular design philosophy, we provide an NER API to perform the text analysis in addition to providing an extension to the service layer API that allows for persistence and retrieval of results. The User Interface and functionality of the module are built on top of these two main components. The NER function exposed through the API is composed of a two-step analysis sequence that ties together implementation-specific concept classifications and the BANNER analysis. Specifically, the first step performs a string-based mapping using OpenMRS s concept dictionary; and the second step uses BANNER. For the first step, because our module tags entities into three distinct entity types but an OpenMRS implementation often has many more concept classes, we allow the user to provide a many-to-one mapping from OpenMRS concept class to entity type. See Table 9 for the default concept class mapping adopted in our module. Given a mapping, our module will use simple string-based matching to find any mentions of OpenMRS concepts and map them to the corresponding class used in our module (e.g., Problem. ) Table 9: OpenMRS Concept Classes to Entity Type Mappings Entity Type OpenMRS Concept Classes Problem Diagnosis, Symptom, Symptom/finding Treatment Drug, Procedure Test Test In the second step, BANNER performs NER on the text. BANNER allows us to identify entities that may not be Figure 5: application. Screenshot of the Language Model Trainer explicitly described in the Concept Dictionary. In the case of a collision between entities identified in the two steps, the first step takes precedence due to its direct link to the OpenMRS implementation. Our module captures and analyzes Visit Notes, as they are recorded, via the Aspect Oriented Programming support provided by the Spring framework [36]. The results of the analysis are persisted in the database for later visualization and retrieval. The module provides an interactive interface for browsing and visualizing Visit Notes and the entities contained therein on a per patient basis. The UI consists of four major components: Word Cloud (top section of figure), Document- Entity Browser (lower left section), Document Rendering (lower right section), and Navigation History (just below top section). The Word Cloud presents the most frequently identified entities in the patient s Visit Notes. Words are color coded by Entity Type and font size determined by frequency. A mouse click on an entity in the Word Cloud will filter the Document-Entity Browser for documents containing that entity, and highlight the entity. The Document-Entity Browser shows a list of the Visit Notes and the entities contained therein, with the ability to filter by Entity Type. Clicking on an entity in this browser will render the visit note in the Document-Rendering, highlight the entity of interest, and scroll the note appropriately. The Navigation History maintains a bread-crumb trail of previous entities examined to quickly retrace the path of the user s navigation history through the data. One exemplar use case is shown in Figure 4. The user navigates to the module s main page and is presented with the word cloud and the Document-Entity Viewer that shows entities found in the patient s documents (in this case, the Treatment tab has been selected to view all treatment entities). The user clicks on Demerol in the Word Cloud and all documents containing the entity Demerol are shown in the Document-Entity Browser. The user then chooses a document to view by clicking on Demerol in the documententity of interest and the document is rendered in the Document-Rendering, centered on the occurrence of Demerol. The user can also use the search box to directly search the entity list for a desired entity. In order to consider the effect of our module on a running OpenMRS implementation, we examined the execution times and system resource usage under various loads. Graphs and details of these measurements can be found at The results show that the execution time is linear in the number of documents and that in a running OpenMRS system, a document takes under one second to tag. B. BANNER Language Model Training Application The module is distributed with a generic BANNER model trained on the I2B2 corpus. We have developed a companion web application that allows the user to train a new BANNER model with Visit Notes recorded and annotated in the OpenMRS module. This Training Application provides an interface for the user to amend and add annotations to the Visit Note corpus. See figure 5.
6 By design, the application sits outside of the OpenMRS system and has only indirect access to the database. The application accesses the visit notes and annotations through an HTTP ReST endpoint exposed by the OpenMRS module. To train a new model, the user navigates to the training web application and provides a URL for the OpenMRS implementation. Our module provides a ReST [38] endpoint to facilitate the transfer of Visit Note data. Visit Notes and corresponding annotations are provided to the training application by OpenMRS. The user reviews the annotations and corrects any mistakes the current model may have made. After the annotations have been corrected, they are provided to BANNER to generate a new model. The new model is then downloaded by the user and subsequently uploaded into the OpenMRS module. The module can maintain several models that can be used interchangeably by the user. VI.! CONCLUSIONS AND FUTURE DIRECTIONS In this paper, we have conducted an extensive empirical evaluation of four Named Entity Recognition systems and we used the results to guide the development of a Point of Care, patient note summary and navigation tool for the open source EMR, OpenMRS. Our tool is built using the open source NER system, BANNER and our evaluations identified some limitations. We observed F1 scores in the.80s, which indicates good, but not perfect, identification of Named Entities. We have provided a companion Language Model Training application with the intent of ameliorating this issue. There are many directions this project could take when we receive more feedback from end users in the field. For example improvements in User Interface informed by usability feedback. References( [1]! C. Moore, "Diving into Data,"InfoWorld (October 25,2002), feundata_1.html. [2]! Singh H, Spitzmueller C, Petersen NJ, et al. Information overload and missed test results in electronic health record- based settings. JAMA Intern Med. 2013;173: [3]! Hirsch, Jamie S., et al. "HARVEST, a longitudinal patient record summarizer." Journal of the American Medical Informatics Association 22.2 (2015): [4]! [5]! Savova, Guergana K., et al. "Mayo clinical Text Analysis and Knowledge Extraction System (ctakes): architecture, component evaluation and applications." Journal of the American Medical Informatics Association 17.5 (2010): [6]! Aronson, Alan R., and François-Michel Lang. "An overview of MetaMap: historical perspective and recent advances." Journal of the American Medical Informatics Association 17.3 (2010): [7]! Settles, Burr. "ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text." Bioinformatics (2005): [8]! McCallum, Andrew K. "{MALLET: A Machine Learning for Language Toolkit}." (2002). [9]! Kim,J. et al. (2003) GENIA corpus a semantically annotated corpus for bio-textmining. Bioinformatics, 19(Suppl. 1), i180 i182. [10]! Yeh,A., Hirschman,L., Morgan,A. and Colosimo,M. (2004) Task 1A: gene-related name mention finding evaluation. In Proceedings of the Critical Assessment of Information Extraction Systems in Biology (BioCreAtIvE) Workshop, Grenada, Spain. [11]! Leaman, Robert, and Graciela Gonzalez. "BANNER: an executable survey of advances in biomedical named entity recognition." Pacific Symposium on Biocomputing. Vol [12]! Dai, M., N. H. Shah, and W. Xuan. "An efficient solution for mapping free text to ontology terms. AMIA Summit on Translational Bioinformatics." San Francisco CA (2008). [13]! Uzuner Ö., Goldstein I, Luo Y, Kohane I. "Identifying patient smoking status from medical discharge records". J Am Med Inform Assoc. 2008; 15(1) [14]! Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner Jr., W. A., Cohen, K. B., Verspoor, K., Blake, J. A., and Hunter, L. E. Concept Annotation in the CRAFT Corpus. BMC Bioinformatics Jul 9;13:161. doi: / [PubMed: ] [15]! [16]! [17]! [18]! [19]! [20]! [21]! [22]! Prokosch HU, McDonald CJ. The effect of computer reminders on the quality of care and resource use. In: Prokosch HU, Dudeck J, editors. Hospital information systems: design and development characteristics; impact and future architecture. Elsevier Science; p [23]! Hirsch, Jamie S., et al. "HARVEST, a longitudinal patient record summarizer."journal of the American Medical Informatics Association 22.2 (2015): [24]! Demner-Fushman, Dina, Wendy W. Chapman, and Clement J. McDonald. What Can Natural Language Processing Do for Clinical Decision Support? Journal of biomedical informatics 42.5 (2009): PMC. Web. 29 June [25]! Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S, Bauer K. A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp 1997: [26]! Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc 2008;15(5): [27]! Lafferty, J., McCallum, A., Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data". Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp [28]! Ferrucci, David, and Adam Lally. "UIMA: an architectural approach to unstructured information processing in the corporate research environment."natural Language Engineering (2004): [29]! [30]! [31]! [32]! [33]! [34]! spring.io [35]! hibernate.org [36]! [37]! Guideline.pdf [38]! Fielding, Roy T., and Richard N. Taylor. "Principled design of the modern Web architecture." ACM Transactions on Internet Technology (TOIT) 2.2 (2002): 115-1
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
Identify Disorders in Health Records using Conditional Random Fields and Metamap
Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Automated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Practical Implementation of a Bridge between Legacy EHR System and a Clinical Research Environment
Cross-Border Challenges in Informatics with a Focus on Disease Surveillance and Utilising Big-Data L. Stoicu-Tivadar et al. (Eds.) 2014 The authors. This article is published online with Open Access by
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model *
Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model * Buzhou Tang 1,2, Yonghui Wu 1, Min Jiang 1, Joshua C. Denny 3, and Hua Xu 1,* 1 School of Biomedical
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
The Prolog Interface to the Unstructured Information Management Architecture
The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, [email protected] 2 IBM
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
IBM Watson and Medical Records Text Analytics HIMSS Presentation
IBM Watson and Medical Records Text Analytics HIMSS Presentation Thomas Giles, IBM Industry Solutions - Healthcare Randall Wilcox, IBM Industry Solutions - Emerging Technology jstart The Next Grand Challenge
Software Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
Extraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Electronic Health Record (EHR) Standards Survey
Electronic Health Record (EHR) Standards Survey Compiled by: Simona Cohen, Amnon Shabo Date: August 1st, 2001 This report is a short survey about the main emerging standards that relate to EHR - Electronic
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
Putting IBM Watson to Work In Healthcare
Martin S. Kohn, MD, MS, FACEP, FACPE Chief Medical Scientist, Care Delivery Systems IBM Research [email protected] Putting IBM Watson to Work In Healthcare 2 SB 1275 Medical data in an electronic or
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
Arti Tyagi Sunita Choudhary
Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining
Exploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics
ezdi s semantics-enhanced linguistic, NLP, and ML approach for health informatics Raxit Goswami*, Neil Shah* and Amit Sheth*, ** ezdi Inc, Louisville, KY and Ahmedabad, India. ** Kno.e.sis-Wright State
Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based
Cerner i2b2 User s s Guide and Frequently Asked Questions. v1.3
User s s Guide and v1.3 Contents General Information... 3 Q: What is i2b2?... 3 Q: How is i2b2 populated?... 3 Q: How often is i2b2 updated?... 3 Q: What data is not in our i2b2?... 3 Q: Can individual
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
Interactive Information Visualization of Trend Information
Interactive Information Visualization of Trend Information Yasufumi Takama Takashi Yamada Tokyo Metropolitan University 6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan [email protected] Abstract This paper
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa
The i2b2 Hive and the Clinical Research Chart
The i2b2 Hive and the Clinical Research Chart Henry Chueh Shawn Murphy The i2b2 Hive is centered around two concepts. The first concept is the existence of services provided by applications that are wrapped
Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015
Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015 Biomedical Informatics: helping visualization from molecules to population Dr. Guillermo
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Big Data Analytics- Innovations at the Edge
Big Data Analytics- Innovations at the Edge Brian Reed Chief Technologist Healthcare Four Dimensions of Big Data 2 The changing Big Data landscape Annual Growth ~100% Machine Data 90% of Information Human
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
ScreenMatch: Providing Context to Software Translators by Displaying Screenshots
ScreenMatch: Providing Context to Software Translators by Displaying Screenshots Geza Kovacs MIT CSAIL 32 Vassar St, Cambridge MA 02139 USA [email protected] Abstract Translators often encounter ambiguous
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
Clinical Mapping (CMAP) Draft for Public Comment
Integrating the Healthcare Enterprise 5 IHE Patient Care Coordination Technical Framework Supplement 10 Clinical Mapping (CMAP) 15 Draft for Public Comment 20 Date: June 1, 2015 Author: PCC Technical Committee
Use Cases for Argonaut Project. Version 1.1
Page 1 Use Cases for Argonaut Project Version 1.1 July 31, 2015 Page 2 Revision History Date Version Number Summary of Changes 7/31/15 V 1.1 Modifications to use case 5, responsive to needs for clarification
Open-EMR Usability Evaluation Report Clinical Reporting and Patient Portal
Open-EMR Usability Evaluation Report Clinical Reporting and Patient Portal By Kollu Ravi And Michael Owino Spring 2013 Introduction Open-EMR is a freely available Electronic Medical Records software application
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology
The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology H. Dunbar Hoskins, Jr., M.D., P. Lloyd Hildebrand, M.D., Flora Lum, M.D. The road towards broad adoption of electronic
LinkZoo: A linked data platform for collaborative management of heterogeneous resources
LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
TDAQ Analytics Dashboard
14 October 2010 ATL-DAQ-SLIDE-2010-397 TDAQ Analytics Dashboard A real time analytics web application Outline Messages in the ATLAS TDAQ infrastructure Importance of analysis A dashboard approach Architecture
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS
SESSION DEPENDENT DE-IDENTIFICATION OF ELECTRONIC MEDICAL RECORDS A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Bachelor of Science with Honors Research Distinction in Electrical
Swirl. Multiplayer Gaming Simplified. CS4512 Systems Analysis and Design. Assignment 1 2010. Marque Browne 0814547. Manuel Honegger - 0837997
1 Swirl Multiplayer Gaming Simplified CS4512 Systems Analysis and Design Assignment 1 2010 Marque Browne 0814547 Manuel Honegger - 0837997 Kieran O' Brien 0866946 2 BLANK MARKING SCHEME 3 TABLE OF CONTENTS
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
ADVANCING MEASUREMENT OF PATIENT- CENTERED OUTCOMES AND QUALITY METRICS WITH ELECTRONIC HEALTH RECORDS
ADVANCING MEASUREMENT OF PATIENT- CENTERED OUTCOMES AND QUALITY METRICS WITH ELECTRONIC HEALTH RECORDS Tina Hernandez-Boussard, PhD, MPH, MS Director, Surgical Health Services Research Unit Assistant Professor
#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf
Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are
Ernestina Menasalvas Universidad Politécnica de Madrid
Ernestina Menasalvas Universidad Politécnica de Madrid EECA Cluster networking event RITA 12th november 2014, Baku Sectors/Domains Big Data Value Source Public administration EUR 150 billion to EUR 300
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
María Elena Alvarado gnoss.com* [email protected] Susana López-Sola gnoss.com* [email protected]
Linked Data based applications for Learning Analytics Research: faceted searches, enriched contexts, graph browsing and dynamic graphic visualisation of data Ricardo Alonso Maturana gnoss.com *Piqueras
TEXT-FILLED STACKED AREA GRAPHS Martin Kraus
Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate
Ontology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols
GE Healthcare Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols Authors: Tianyi Wang Information Scientist Machine Learning Lab Software Science &
WHITE PAPER. QualityAnalytics. Bridging Clinical Documentation and Quality of Care
WHITE PAPER QualityAnalytics Bridging Clinical Documentation and Quality of Care 2 EXECUTIVE SUMMARY The US Healthcare system is undergoing a gradual, but steady transformation. At the center of this transformation
Instructions for data-entry and data-analysis using Epi Info
Instructions for data-entry and data-analysis using Epi Info After collecting data using the tools for evaluation and feedback available in the Hand Hygiene Implementation Toolkit (available at http://www.who.int/gpsc/5may/tools
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Improving EHR Semantic Interoperability Future Vision and Challenges
Improving EHR Semantic Interoperability Future Vision and Challenges Catalina MARTÍNEZ-COSTA a,1 Dipak KALRA b, Stefan SCHULZ a a IMI,Medical University of Graz, Austria b CHIME, University College London,
Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
TRANSFoRm: Vision of a learning healthcare system
TRANSFoRm: Vision of a learning healthcare system Vasa Curcin, Imperial College London Theo Arvanitis, University of Birmingham Derek Corrigan, Royal College of Surgeons Ireland TRANSFoRm is partially
Predicting Chief Complaints at Triage Time in the Emergency Department
Predicting Chief Complaints at Triage Time in the Emergency Department Yacine Jernite, Yoni Halpern New York University New York, NY {jernite,halpern}@cs.nyu.edu Steven Horng Beth Israel Deaconess Medical
Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities
Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities April, 2013 gaddsoftware.com Table of content 1. Introduction... 3 2. Vendor briefings questions and answers... 3 2.1.
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
METHODS IN MEDICAL INFORMATICS
Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys
Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014 Introduction Newmont Mining is a resource extraction company with a research and development
2 AIMS: an Agent-based Intelligent Tool for Informational Support
Aroyo, L. & Dicheva, D. (2000). Domain and user knowledge in a web-based courseware engineering course, knowledge-based software engineering. In T. Hruska, M. Hashimoto (Eds.) Joint Conference knowledge-based
