Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model *
|
|
|
- Mary McGee
- 10 years ago
- Views:
Transcription
1 Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model * Buzhou Tang 1,2, Yonghui Wu 1, Min Jiang 1, Joshua C. Denny 3, and Hua Xu 1,* 1 School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA 2 Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China 3 Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA {buzhou.tang, yonghui.wu, min.jiang, hua.xu}@uth.tmc.edu, [email protected] Abstract. The ShARe/CLEF ehealth Evaluation Lab (SHEL) organized a challenge on natural language processing (NLP) and information retrieval (IR) in the medical domain in The first task of the 2013 ShARe/CLEF challenge was to extract disorder mention spans and their associated UMLS (Unified Medical Language System) concept unique identifiers (CUIs). We participated in Task 1 and developed a clinical disorder recognition and encoding system. The proposed system consists of two components: a machine learning-based approach to recognize disorder entities and a vector space model-based method to encode disorders to UMLS CUIs. The challenge organizers manually annotated disorder entities and corresponding UMLS CUIs in 298 clinical notes, of which 199 notes were used for training and 99 were for testing. Evaluation on the test data set showed that our system achieved the best F-measure of for entity recognition (ranked first) and the highest F-measure of for UMLS CUI encoding (ranked third), indicating the promise of the proposed approaches. Keywords: medical language processing, natural language processing, named entity recognition, UMLS encoding, clinical concept extraction, conditional random fields, structured support vector machines, vector space model. 1 Introduction Clinical natural language processing (NLP) has received great attention in recent years because it is critical to unlock information embedded in clinical documents in the secondary use of electronic health records (EHRs) data for clinical and translational research. Clinical concept extraction, which recognizes clinically relevant entities (e.g., diseases, drugs, labs etc.) in text and maps them to identifiers in standard vocabularies (e.g., Concept Unique Identifier (CUI) defined in Unified Medical Lan- * corresponding author
2 guage System (UMLS) [1]), is one of the fundamental tasks in clinical NLP research. Many systems have been developed to extract clinical concepts from various types of clinical notes in last two decades. Earlier studies mainly focused on building symbolic NLP systems that are heavily based on domain knowledge (e.g., medical vocabularies). The representative systems include MedLEE [2], SymText/MPlus [3][4], MetaMap [5], KnowledgeMap [6], ctakes[7], and HiTEX [8]. In the past few years, with the increasingly available annotated clinical corpora, researchers started to investigate the use of machine learning algorithms in clinical entity recognition. The Center for Informatics for Integrating Biology & the Beside (i2b2) has organized a few clinical NLP challenges to promote research in this field. In 2009, the i2b2 NLP challenge was to recognize medication-related concepts. Both rule-based and machine learning based methods as well as hybrid methods were developed by over twenty participating teams [9]. In the 2010 i2b2 clinical NLP challenge, organizers expanded clinical concepts from medication to problems, tests, and treatments. Most of systems were primarily based on machine learning algorithms in this challenge, likely due to the availability of large annotated datasets [10]. In 2013, the ShARe/CLEF ehealth Evaluation Lab (SHEL) organized three shared tasks on natural language processing (NLP) and information retrieval (IR): 1) clinical disorder extraction and encoding to Systematized Nomenclature Of Medicine Clinical Terms (SNOMED-CT), 2) acronym/abbreviation identification, and 3) retrieval of web pages based on queries generated when reading the clinical reports. The Task 1 on clinical disorder extraction is similar to the 2010 i2b2 challenge on clinical problem extraction. However, there are two major differences between these two tasks: 1) ShARe/CLEF task allowed disjoint entities, while 2010 i2b2 clinical problem extraction only dealt with entities of consecutive words; and 2) ShARe/CLEF task required mapping disorder entities to SNOMED-CT (using UMLS CUIs), which was not required in the 2010 i2b2 challenge. In this paper, we describe our system for Task 1 of the 2013 ShARe/CLEF challenge. Our system consists of a machine learning based approach for disorder entity recognition and a Vector Space Model (VSM) based method for mapping extracted entities to SNOMED-CT codes. Evaluation by the organizers showed our system was top-ranked among all participating teams. 2 Methods Fig. 1 shows the overview architecture of our systems for the first task of the ShARe/CLEF ehealth 2013 shared task. It is an end-to-end system of two components: disorder entity recognition and encoding. The first component consists of five modules. As the clinical narrative supplied by the organizer was not well formatted, we developed rule-based modules to detect the boundary of sentences and tokenize them for each note at first, and aligned the preprocessed note back to the original one at last. The other components were presented in the following sections in detailed.
3 Fig. 1. The overview architecture of our disorder concept extraction systems for the first task of the ShARe/CLEF ehealth 2013 shared task. 2.1 Dataset The organizers collected 298 notes from different clinical encounters including radiology reports, discharge summaries, and ECG/ECHO reports. For each note, disorder entities were annotated based on a pre-defined guideline and then mapped to SNOMED-CT concepts represented by UMLS CUIs. If a disorder entity cannot be found in SNOMED-CT, it will be marked as CUI-less. The data set was divided into two parts: a training set of 199 notes that were used for system development, and a test set of 99 notes for evaluating systems. In the training set, 5811 disorder entities were annotated and mapped to 1007 unique CUIs or CUI-less. The test set contained 5340 disorder entities with 795 CUIs or CUI-less. Table 1 shows the counts of entities and CUIs in the training and test datasets. 2.2 Disorder entity recognition In machine learning-based named entity recognition (NER) systems, annotated data are typically converted into a BIO format, where each word is assigned into one of three labels: B means beginning of an entity, I means inside an entity, and O means outside of an entity. Thus the NER problem is converted into a classification problem to assign one of the three labels to each word. As mentioned previously, one challenge of this task is that some disorder mentions (>10%) were disjoint, which could not be directly solved using the traditional BIO approach, which only works on entities with consecutive words. Therefore we developed different strategies for consecutive entities and disjoint entities. For consecutive disorder entities, we labeled words
4 Table 1. Statistics of the dataset. Dataset Type #Note #Mention #CUI-less Training All ECHO RADIOLOGY DISCHARGE ECG Test All ECHO RADIOLOGY DISCHARGE ECG using traditional BIO tags. For disjoint entities, we created two additional sets of tags: 1) D{B, I} was used to label disjoint entity words that are not shared by multiple concepts (called non-head entity); and 2) H{B, I} was used to label head words that belonged to more than two disjoint concepts (called head entity). Figure 2 shows some examples of labeling consecutive and disjoint disorder entities using our new tagging sets. In this approach, we need to assign one of the seven labels {B, I, O, DB, DI, HB, HI} to each word. When converting labeled words to entities, we defined a few simple rules. For example, one rule for head words is for each disjoint head entity, combine it with all other non-head entities to form final disorder entities. Sentence 1: The left atrium is dilated. Encoding: The/O left/db atrium/di is/o dilated/db./o Sentence 2: The aortic root and ascending aorta are moderately dilated. Encoding: The/O aortic/db root/di and/o ascending/db aorta/di are/o moderately/o dilated/hb./o Fig. 2. Examples of tagging for disjoint disorder entities. We investigated two machine learning algorithms for disorder entity recognition. One is Conditional Random Fields (CRFs), which is a representative sequence labeling algorithm and is suitable for the NER problem. Another one is Structural Support Vector Machines (SSVMs), which was proposed by Tsochantaridis et al. [23] in 2005 for structural data, such as trees and sequences. It is an SVMs-based discriminative algorithm for structural prediction. Therefore, SSVMs combines the advantages of both CRFs and SVMs and is suitable for sequence labeling problems as well. CRFsuite ( and SVM hmm ( were used as implements of CRF and SSVM respectively. For features, we used bag-of-word, part-of-speech (POS) from Stanford tagger ( type of notes, section information, word representation from Brown clustering [11] and random indexing [12],
5 semantic categories of words based on UMLS [1] lookup, MetaMap [5], or ctakes [7] outputs. Most of features were the same as those used in our previous system for medical concept recognition [13][14][15][16]. 2.3 Disorder entity encoding We treated disorder entity encoding as a ranking problem, where each recognized disorder entity was considered as a query and candidates terms in UMLS as documents. The Vector Space Model (VSM) was used in this task. The process consists of two steps: 1) generate candidate CUIs from UMLS; and 2) rank candidate CUIs and then take the top ranked CUI as the system s output. We applied following criteria to select candidate CUIs from UMLS for a given disorder entity: the corresponding terms of a candidate CUI should contain all words in the disorder entity (except stop words). For each candidate CUI, a vector containing its words, weighted by term frequency inverse document frequency (tf-idf) derived from entire UMLS/SNOMED-CT terms, was created. The cosine similarity between a disorder entity vector and a candidate CUI vector was calculated and used to rank candidate CUIs. The top ranked CUI was then selected as the correct CUI of the entity. In order to leverage the training data, we further built a limited VSM-based encoding system by using CUIs/terms and entities occurred in the training set only, instead of the entire UMLS. When processing the test set, we first determined whether an entity occurred in the training set or not. If it did, we used the limited VSM-based encoding system to predict the corresponding CUI. Otherwise, we used the general VSM-based encoding system that was built on entire UMLS. 2.4 Experiments and Evaluation Our system was developed and trained using the training set (199 notes) and was evaluated using the test set (99 notes). All parameters of CRF and SSVM were optimized by 10-fold cross-validation on the training dataset. The performance of disorder entity recognition were evaluated by precision, recall and F-measure in both strict and relaxed modes, where strict refers that a concept is correctly recognized if and only if the starting and ending offsets of it is exactly same as a disorder mention in the gold standard, and relaxed refers that a disorder mention is correctly recognized as long as it overlaps with any disorder mention in the gold standard. For encoding of SNOMED-CT, all participating systems were evaluated using accuracy only, in strict and relaxed modes, as defined in [17][18]. 3 Results Table 2 shows the best performance of our system in the ShARe/CLEF ehealth 2013 shared task 1 as reported by the organizers, where Pre, Rec, F and Acc denote precision, recall, F-measure and accuracy respectively. For disorder entity recognition, the SSVM-based system outperformed CRF-based system, achieving the best
6 F-measures of under strict criterion and under relaxed criterion, ranked first in the challenge. For SNOMED encoding, our system achieved the best accuracy of 0.514, ranked third in the challenge. Table 2. The performance of our system for the ShARe/CLEF ehealth 2013 shared task 1. Task Strict Relaxed Pre Rec F Acc Pre Rec F Acc Task 1a (entity recognition) Task 1b (SNOMED encoding) Discussion Although a number of existing clinical NLP systems such as MedLEE [2], MetaMap [5], KnowledgeMap [6], and ctakes [7] can extract clinical concepts and map them to UMLS CUIs, it is difficult to compare the performance of these systems because there is a lack of publically available corpora with annotations of UMLS CUIs. The 2013 ShARe/CLEF ehealth shared task 1 provides such a benchmark dataset for clinical concept recognition and encoding, which is a significant contribution to the clinical NLP research. Furthermore, the best system in the challenge achieved an accuracy of on encoding SNOME concepts, indicating it is still very challenging to develop general clinical NLP systems that can accurately recognize and encode clinical disorders to standard terminologies. In this study, we developed a clinical disorder recognition and encoding system that combines a machine learning based approach for entity recognition and a VSMbased approach for UMLS concept mapping. Our system was top-ranked among all participating teams, indicating the promise of proposed approaches. However, there is still much room for further improvement. First, our proposed method for disjoint entity recognition has limitations. For example, if a sentence has multiple disjoint entities, our current simple rule-based strategies would not be able to resolve the ambiguity and will produce wrong combinations of disorder entities as shown in Fig 3, where there are two disorder entities in the given sentence: blood on his tongue and pupils pinpoint, which are represented by blood/db on/db his/di tongue/di and pupils/db pinpoint/db respectively, but parsed into one disorder entity blood on his tongue pupils pinpoint by our strategies. Thus, more sophisticated methods for disjoint concept recognition should be investigated in future. In addition, our VSM-based method to map entities to UMLS CUIs is not optimal. When compared with the top ranked team on UMLS CUI mapping, our system achieved better performance on entity recognition, but lower accuracy on CUI mapping, indicating the weakness of our encoding step. A few possible aspects for further improvement are: 1) use other types of information as features for building vectors, such as context, type of notes, section information and so on; 2) explore other
7 ranking algorithms such as Support Vector Machines [19], and 3) implement word sense disambiguation algorithms for ambiguous entities. Fig. 3. Examples of entity parsing errors. 5 Conclusions We developed a clinical disorder recognition and encoding system that consists of a machine learning-based approach to recognize disorder entities and a vector space model-based method to encode disorders to UMLS CUIs. Our entry based on this system was top-ranked in the 2013 ShARe/CLEF ehealth shared task 1, indicating the promise of our approaches. However, more investigations are needed in order to achieve satisfactory performance on extracting and encoding medical concepts in clinical text. Acknowledge This study is supported in part by grants from NLM R01LM010681, the Office of the National Coordinator for Health Information Technology , NCI 1R01CA141307, and NIGMS 1R01GM We also thank the ShARe/CLEF ehealth shared task 2013 organizers, who were funded by the United States National Institutes of Health with grant (R01GM090187). References [1] Unified Medical Language System (UMLS) - Home. [Online]. Available: [Accessed: 22-May-2013]. [2] C. Friedman, P. O. Alderson, J. H. Austin, J. J. Cimino, and S. B. Johnson, A general natural-language text processor for clinical radiology., J Am Med Inform Assoc, vol. 1, no. 2, pp , [3] S. B. Koehler, SymText : a natural language understanding system for encoding free text medical data;, University of Utah;, [4] L. M. Christensen, P. J. Haug, and M. Fiszman, MPLUS: a probabilistic medical language understanding system, in Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3, Stroudsburg, PA, USA, 2002, pp [5] A. R. Aronson and F.-M. Lang, An overview of MetaMap: historical perspective and recent advances, J Am Med Inform Assoc, vol. 17, no. 3, pp , May 2010.
8 [6] J. C. Denny, P. R. Irani, F. H. Wehbe, J. D. Smithers, and A. Spickard, The KnowledgeMap Project: Development of a Concept-Based Medical School Curriculum Database, AMIA Annu Symp Proc, vol. 2003, pp , [7] G. K. Savova, J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper- Schuler, and C. G. Chute, Mayo clinical Text Analysis and Knowledge Extraction System (ctakes): architecture, component evaluation and applications, J Am Med Inform Assoc, vol. 17, no. 5, pp , Sep [8] Q. T. Zeng, S. Goryachev, S. Weiss, M. Sordo, S. N. Murphy, and R. Lazarus, Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system, BMC Med Inform Decis Mak, vol. 6, p. 30, [9] O. Uzuner, I. Solti, and E. Cadag, Extracting medication information from clinical text, J Am Med Inform Assoc, vol. 17, no. 5, pp , Oct [10] Ö. Uzuner, B. R. South, S. Shen, and S. L. DuVall, 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text, J Am Med Inform Assoc, vol. 18, no. 5, pp , Oct [11] P. F. Brown, P. V. desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, Class-Based n-gram Models of Natural Language, Computational Linguistics, vol. 18, pp , [12] K. Lund and C. Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence, Behavior Research Methods, Instruments, & Computers, vol. 28, no. 2, pp , Jun [13] M. Jiang, Y. Chen, M. Liu, S. T. Rosenbloom, S. Mani, J. C. Denny, and H. Xu, A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries, J Am Med Inform Assoc, vol. 18, no. 5, pp , Oct [14] B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features, BMC Med Inform Decis Mak, vol. 13 Suppl 1, p. S1, [15] B. Tang, H. Cao, Y. Wu, M. Jiang, and H. Xu, Clinical entity recognition using structural support vector machines with rich features, in Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics, New York, NY, USA, 2012, pp [16] B. Tang, Y. Wu, M. Jiang, Y. Chen, J. C. Denny, and H. Xu, A hybrid system for temporal information extraction from clinical text, J Am Med Inform Assoc, Apr [17] H. Suominen, S. Salantera, S. Sanna, and et al, Overview of the ShARe/CLEF ehealth Evaluation Lab 2013, presented at the Proceedings of CLEF 2013, 2013, p. To appear. [18] W. Chapman, G. Savova, and N. Elhadad, ShARe/CLEF Shared Task 1 for boundary detection and normalization of SNOMED disorders, presented at the Proceedings of CLEF 2013, 2013, p. To appear. [19] T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 2002, pp
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
Identify Disorders in Health Records using Conditional Random Fields and Metamap
Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian
Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based on Machine Learning
3rd International Conference on Materials Engineering, Manufacturing Technology and Control (ICMEMTC 2016) Extracting Clinical entities and their assertions from Chinese Electronic Medical Records Based
A Supervised Named-Entity Extraction System for Medical Text
A Supervised Named-Entity Extraction System for Medical Text Andreea Bodnari 1,2, Louise Deléger 2, Thomas Lavergne 2, Aurélie Névéol 2, and Pierre Zweigenbaum 2 1 MIT, CSAIL, Cambridge, Massachusetts,
Generating Patient Problem Lists from the ShARe Corpus using SNOMED CT/SNOMED CT CORE Problem List
Generating Patient Problem Lists from the ShARe Corpus using SNOMED CT/SNOMED CT CORE Problem List Danielle Mowery Janyce Wiebe University of Pittsburgh Pittsburgh, PA [email protected] [email protected]
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Software Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
Automated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
A Supervised Abbreviation Resolution System for Medical Text
A Supervised Abbreviation Resolution System for Medical Text Pierre Zweigenbaum 1, Louise Deléger 1, Thomas Lavergne 1, Aurélie Névéol 1, and Andreea Bodnari 1,2 1 LIMSI-CNRS, rue John von Neumann, F-91400
Predicting Chief Complaints at Triage Time in the Emergency Department
Predicting Chief Complaints at Triage Time in the Emergency Department Yacine Jernite, Yoni Halpern New York University New York, NY {jernite,halpern}@cs.nyu.edu Steven Horng Beth Israel Deaconess Medical
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
Patient Similarity-guided Decision Support
Patient Similarity-guided Decision Support Tanveer Syeda-Mahmood, PhD IBM Almaden Research Center May 2014 2014 IBM Corporation What is clinical decision support? Rule-based expert systems curated by people,
Extracting timing and status descriptors for colonoscopy testing from electronic medical records
Extracting timing and status descriptors for colonoscopy testing from electronic medical records Joshua C Denny, 1,2 Josh F Peterson, 1,2,3 Neesha N Choma, 2,3 Hua Xu, 1 Randolph A Miller, 1 Lisa Bastarache,
Annotating Medical Forms using UMLS
Annotating Medical Forms using UMLS Victor Christen 1, Anika Groß 1, Julian Varghese 2, Martin Dugas 2, Erhard Rahm 1 1 Department of Computer Science, Universität Leipzig, Germany 2 Institute of Medical
Natural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS [email protected] Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston
Searching biomedical data sets Hua Xu, PhD The University of Texas Health Science Center at Houston Motivations for biomedical data re-use Improve reproducibility Minimize duplicated efforts on creating
Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
Structuring Unstructured Clinical Narratives in OpenMRS with Medical Concept Extraction
Structuring Unstructured Clinical Narratives in OpenMRS with Medical Concept Extraction Ryan M Eshleman, Hui Yang, and Barry Levine [email protected], [email protected], [email protected] Department
Optimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
Find the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology
The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology H. Dunbar Hoskins, Jr., M.D., P. Lloyd Hildebrand, M.D., Flora Lum, M.D. The road towards broad adoption of electronic
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
128 2008 IMIA and SchattauerGmbH Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research S. M. Meystre 1,G. K. Savova 2, K. C. Kipper-Schuler 2, J. F.
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Named Entity Recognition in Chinese Clinical Text
Texas Medical Center Library DigitalCommons@The Texas Medical Center UT SBMI Dissertations (Open Access) School of Biomedical Informatics Fall 12-2014 Named Entity Recognition in Chinese Clinical Text
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
Natural Language Processing for Clinical Informatics and Translational Research Informatics
Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. [email protected] K99 Fellow in Biomedical Informatics University of Washington Background
An Information Extraction Framework for Cohort Identification Using Electronic Health Records
An Information Extraction Framework for Cohort Identification Using Electronic Health Records Hongfang Liu PhD 1, Suzette J. Bielinski PhD 1, Sunghwan Sohn PhD 1, Sean Murphy 1, Kavishwar B. Wagholikar
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
Pharmacovigilance, also referred to as drug safety surveillance, has been
SOCIAL MEDIA Identifying Adverse Drug Events from Patient Social Media A Case Study for Diabetes Xiao Liu, University of Arizona Hsinchun Chen, University of Arizona and Tsinghua University Social media
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track
Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track Yung-Chun Chang 1,2, Yu-Chen Su 3, Chun-Han Chu 1, Chien Chin Chen 2 and
Microblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
138 Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research S. M. Meystre 1,G. K. Savova 2, K. C. Kipper-Schuler 2, J. F. Hurdle 1 1 Department of Biomedical
PPInterFinder A Web Server for Mining Human Protein Protein Interaction
PPInterFinder A Web Server for Mining Human Protein Protein Interaction Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan Data Mining and Text Mining Laboratory, Department of Bioinformatics, Bharathiar
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Semantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Annotation and Extraction of Relations from Italian Medical Records
Annotation and Extraction of Relations from Italian Medical Records Giuseppe Attardi, Vittoria Cozza, Daniele Sartiano Dipartimento di Informatica Università di Pisa Largo B. Pontecorvo, 3 I-56127 Pisa,
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Extraction of Radiology Reports using Text mining
Extraction of Radiology Reports using Text mining A.V.Krishna Prasad 1 Dr.S.Ramakrishna 2 Dr.D.Sravan Kumar 3 Dr.B.Padmaja Rani 4 1 Research Scholar S.V.University, Tirupathi & Associate Professor CS MIPGS,Hyderabad,
SVM Based Learning System For Information Extraction
SVM Based Learning System For Information Extraction Yaoyong Li, Kalina Bontcheva, and Hamish Cunningham Department of Computer Science, The University of Sheffield, Sheffield, S1 4DP, UK {yaoyong,kalina,hamish}@dcs.shef.ac.uk
BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION
BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION ROBERT LEAMAN Department of Computer Science and Engineering, Arizona State University GRACIELA GONZALEZ * Department of
Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes
Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes Ingrid Andås Berg Healthcare Informatics Submission date: March 2014 Supervisor: Øystein Nytrø, IDI
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery
Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
Towards Inferring Web Page Relevance An Eye-Tracking Study
Towards Inferring Web Page Relevance An Eye-Tracking Study 1, [email protected] Yinglong Zhang 1, [email protected] 1 The University of Texas at Austin Abstract We present initial results from a project,
TREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China [email protected] http://www.ict.ac.cn/
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Practical Implementation of a Bridge between Legacy EHR System and a Clinical Research Environment
Cross-Border Challenges in Informatics with a Focus on Disease Surveillance and Utilising Big-Data L. Stoicu-Tivadar et al. (Eds.) 2014 The authors. This article is published online with Open Access by
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
DYNAMIC QUERY FORMS WITH NoSQL
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 7, Jul 2014, 157-162 Impact Journals DYNAMIC QUERY FORMS WITH
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal 1, 3, Anirban Kundu 2, 3, Abhay Singh 1, Raj Shekhar 1, Kunal Sinha 1 1 College of Engineering and Management,
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan [email protected]
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
