Sanda Harabagiu. The University of Texas at Dallas Human Language Technology Research Institute
|
|
- Eustace Simon Higgins
- 8 years ago
- Views:
Transcription
1 Linking Information Extracted from Electronic Medical Records to Structured Knowledge Sanda Harabagiu The University of Texas at Dallas
2 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned
3 Ontological Resources: UMLS Unified Medical Language System (UMLS) consists of 1. a semantic network of biomedical semantic concepts and semantic relations that span them; hindlimb 2. a metathesaurus which encodes terms and codes from many vocabularies, including CPT, ICD-10-CM, LOINC, MeSH, RxNorm, and SNOMED CT leg region lower extremity hind limb 3. SPECIALIST Lexicon and Lexical Tools. We used UMLS to expand topic keywords into phrases encoded in the UMLS Metathesaurus which share the same CONCEPT ID. leg lower leg This primarily provides high confidence Keyword synonyms.
4 Resources: Clinical Ontologies The Systemized Nomenclature of Medicine Clinical Terms (SNOMED CT) is the most comprehensive, multilingual clinical healthcare terminology in the world. SNOMED CT is owned, maintained and distributed by the International Health Terminology Standards Development Organization (IHTSDO). SNOMED CT consists of four primary core components: 1. Concept Codes - numerical codes that identify clinical terms, primitive or defined, organized in hierarchies 2. Descriptions - textual descriptions of Concept Codes 3. Relationships - relationships between Concept Codes that have a related meaning 4. Reference Sets - used to group Concepts or Descriptions into sets, including reference sets and cross-maps to other classifications and standards. We utilize this relationship knowledge to expand a keyword so that it captures any phrase that partakes in the child-side of an IS-A, PART-OF or COMPONENT relationship. This allows us to expand hypernyms and meronyms. clozaril clozapine abilify atypical antipsychotic asenapine aripirazole
5 The Problem Ontologies provide machine-readable descriptions of biomedical concepts and their relations. Linking domain-specific terms expressed in clinical texts to their ontological encodings provides a platform for semantic interpretation of the clinical narratives. Knowledge extracted from clinical documents can be curated and used to update the content of biomedical ontologies.
6 The difficulties Principal link between clinical or biomedical texts and an ontology is a terminology, which aims to map concepts to terms. A term is a textual realization of a concept, e.g. disease, gene, protein. The problems: term variation and term ambiguity. Terms have a context they may have assertions associated with them Relations between terms exist differently that relations between concepts
7 Term Variation Term variation originates from the ability of a natural language to express a single concept in a number of ways. For example, in biomedicine there are many synonyms for proteins, enzymes, genes, etc Having six or seven synonyms for a single concept is not unusual in this domain. The probability of two experts using the same term to refer to the same concept is less than 20 per cent. In addition, biomedicine includes pharmacology, where numerous trademark names refer to the same compound (eg Advil, Brufen, Motrin, Nuprin and Nurofen all refer to ibuprofen).
8 Term ambiguity Bad News!!! :Term ambiguity occurs when the same term is used to refer to multiple concepts. Ambiguity is an inherent feature of natural language. Words typically have multiple dictionary entries and the meaning of a word can be altered by its context. Some Good News: Sublanguages, as the languages confined to specialized domains, provide a context which generally reduces the level of ambiguity. More Bad News!!! However, biomedicine encompasses a plethora of subdomains, which is an additional cause for the high level of ambiguity in biomedical terminology. For example, the term promoter refers to a binding site in a DNA chain at which RNA polymerase binds to initiate transcription of messenger RNA by one or more nearby structural genes in biology, while in chemistry it denotes a substance that in very small amounts is able to increase the activity of a catalyst. In addition, acronyms are extensively used in biomedicine (a new acronym is introduced in every five to ten abstracts in Medline) and they are known to be highly ambiguous (.80 per cent of acronyms are ambiguous, the average number of possible interpretations being 15).
9 More on ambiguity Acronym expansion: For example, AR could be expanded to any of the following terms: 1. Androgen Receptor, 2. AmphiRegulin, 3. Acyclic Retinoid, 4. Agonist Receptor, 5. Adrenergic Receptor Origins of ambiguity: text is not the only origin of ambiguity in biomedicine. Ambiguity is inherent to the field, because the evolution of species gave rise to many homologues and analogues. For instance, NFKB2 denotes a family of two individual proteins with separate identifiers in Swiss-Prot. These proteins are homologues belonging to different species, human and chicken
10 Pipeline of annotations Each natural language processing layer enhances the knowledge representation with machine readable information. Different forms of ambiguity are solved in the process: Lexical, syntactic, semantic, pragmatics Additional benefits: Joint learning and extraction of concepts and relations among them Learning how to represent context!!!!
11 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned
12 Details on Concept Extraction Based on our experiments with the 2010 i2b2 Challenge data Extracting concepts involved two decisions Boundary classification: Identify the first and last words of each concept Type classification: Is the concept a problem, test, or treatment? Discharge summaries contain numerous fields (zones), some of which are semi-structured (dates, dosages, etc), others which are un-structured ( prose ) Finding: Both have concepts
13 The Data The 2010 i2b2 challenge data consists of 826 discharge summaries and progress notes, split into 349 training and 477 testing documents. The documents are annotated by medical professionals familiar with their use. The data contains 72,846 medical concepts (27k train, 45k test). Each concept is classified as: 1. a problem (e.g., disease, injury), 2. test (e.g., diagnostic procedure, lab test), or 3. treatment (e.g., drug, preventative procedure, medical device). Medical problems are assigned an assertion type (belief status) among: present, absent, possible, hypothetical, conditional, or associated with someone else. The distribution of assertion types is far from uniform: 69% of all problems are considered present, 20% absent, less than 5% for possible and hypothetical, and less than 1% for conditional and associated with someone else. Additionally, the data contains a third set of annotations, relations between concepts
14 Concept Extraction Architecture New Resources: Wikipedia and WordNet Advanced Semantic Processing Lexical, syntactic and semantic disambiguation Terms exhibit a high degree of variation, which is not always explicitly reflected in biomedical ontologies. For this reason, the UMLS ontology is distributed together with computational support for neutralisation of variation in the biomedical domain. MetaMap is a highly configurable program developed by Dr. Alan Aronson at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text.
15 Example Quantitative Extractions Type Example Age patient is 79 years old. Date diagnosed on april with DiseaseID CHRONIC RENAL FAILURE ( ICD-9-CM 585 ) Dosage 5. Colace 100 milligrams po bid. List Element 5. Colace 100 milligrams po bid. Measurement Weight is 82 kilograms. Name Percent Time Electronically Signed by **NAME[YYY ZZZ] Birth weight was 3.29 kilograms in the 75th percentile FRI :04 PM
16 Concept Extraction Preprocessing: Rule-based detection of measurements, dosages, & other entities Boundary Extraction: Heuristic separates prose from non-prose text. Then two Conditional Random Field (CRF) classifiers are used to extract concepts (one from prose, one for non-prose) Concept Type: problem, test, or treatment Support Vector Machine (SVM) classifier performs 3-way classification
17 Concept Extraction Resources used: MetaMap/UMLS GENIA (chunk, POS) WordNet lemmas Quantitative types Results: Semantic parsing Wikipedia Various word features Affix features P R F1 Exact Boundary Exact Boundary + Type Inexact Boundary Inexact Boundary + Type
18 Feature Set 2 (1/2) CONTENT WORD (cw): lexicalized feature that selects an informative word from the constituent, other than the head. Selection heuristics available in the paper. E.g. June for the phrase in last June. PART OF SPEECH OF CONTENT WORD (cpos): part of speech tag of the content word. E.g. NNP for the phrase in last June. PART OF SPEECH OF HEAD WORD (hpos): part of speech tag of the head word. E.g. NN for the phrase the futures halt. NAMED ENTITY CLASS OF CONTENT WORD (cne): The class of the named entity that includes the content word. 7 named entity classes (from the MUC-7 specification) covered. E.g. DATE for in last June s treatment. 18
19 Feature Set 2 (2/2) BOOLEAN NAMED ENTITY FLAGS: set of features that indicate if a named entity is included at any position in the phrase: nediseaseid: set to true if an disease name is recognized in the phrase. nedosage: set to true if a dosage is recognized in the phrase. neperson: set to true if a person name is recognized in the phrase. nelist: set to true if a list expression is recognized in the phrase. nepercent: set to true if a percentage expression is recognized in the phrase. neage: set to true if a time of day expression is recognized in the phrase. nedate: set to true if a date temporal expression is recognized in the phrase. PHRASAL VERB COLLOCATIONS: set of two features that capture information about phrasal verbs: pvcsum: the frequency with which a verb is immediately followed by any preposition or particle. pvcmax: the frequency with which a verb is followed by its predominant preposition or particle. 19
20 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned
21 Assertion Classification Determining the belief status of a medical problem is a combination of Prior probability for the problem Detection of context clues (words, predicates, section names) SVM classifier performed 6-way classification Present Absent Hypothetical Conditional Possible Associated with someone else
22 Architecture of assertion classification system We use a NegEx feature to indicate the negation word associated with the medical problem. This allows the classifier to decide whether or not the negation word is useful and what assertion type it reflects. Additional medical features indicate if the problem was found in UMLS or MetaMap as the distribution of assertion types for problems found within these resources differs from that of the documents. We use the General Inquirer s categorical information to better understand the context of a medical problem. We only use the If category, which indicates uncertainty words such as unexpected, hesitant, or suspicious.
23 Assertion Classification Resources used: Semantic Parsing NegEx General Inquirer Stemmed previous words Section Name Results: # P R F1 Present Absent Possible Hypothetical Conditional Assoc. w. someone else Overall 92.7
24 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned
25 Relation Identification Relations can be present between any two concepts in a sentence We disallow relations between concepts with more than 9 intervening concepts Our Approach Form pairs of concepts from the sentence Classify each pair as having one of the relation types, or no relation
26 Relation Types 1. TrIP: A certain treatment has improved or cured a medical problem (e.g., infection resolved with antibiotic course ); 2. TrWP: A patient s medical problem has deteriorated or worsened because of or in spite of a treatment being administered (e.g., the tumor was growing despite the drain ); 3. TrCP: A treatment caused a medical problem (e.g., penicillin causes a rash ); 4. TrAP: A treatment administered for a medical problem (e.g., Dexamphetamine for narcolepsy ); 5. TrNAP: The administration of a treatment was avoided because of a medical problem (e.g., Ralafen which is contra-indicated because of ulcers ); 6. TeRP: A test has revealed some medical problem (e.g., an echocardiogram revealed a pericardial effusion ). 7. TeCP: A test was performed to investigate a medical problem (e.g., chest x-ray done to rule out pneumonia ); and 8. PIP: Two problems are related to each (e.g., Azotemia presumed secondary to sepsis ).
27 Strategy for extracting relations from electronic medical records The problem of relation discovery was cast as a multiclass classification problem. The classifier not only decides whether there is a relation between a pair of medical concepts, but it also decides the relation s type. To be able to make such decisions the classification system is trained on 349 documents comprising 5,264 relations.
28 Extraction of Medical Relations The multi-class classifier was implemented by using a Support Vector Machine (SVM) implementation called LibLINEAR [5]. This software is an extension of LibSVM [6] restricted to a linear kernel to achieve significant speed gains. LibLINEAR allows users to specify the importance of each class through a weighting mechanism. In this way, we could specify that no relation should be given less weight. A frequent class tends to bias SVM decisions toward that class improving accuracy, but possibly hurting F1 measure. Several weight values for the no relation class were tested by cross validation on the training set. The value which led to the best score was 0.025, a heavy discounting factor compared to the default of 1.0. Similarly, cross validation on the training set achieved the best results when the regularization parameter, C, was set to 0.5 and the termination parameter, epsilon, was set to 0.5.
29 Example [Bradycardia] prob is resolved after [beta blockers] treat and [calcium channel blockers] treat were stopped and [Norvasc] treat was started. The following pairs are formed from this sentence: (Bradycardia, beta blockers) [TrCP] (Bradycardia, calcium channel blockers) [TrCP] (Bradycardia, Norvasc) [TrIP] (beta blockers, calcium channel blockers) (beta blockers, Norvasc) (calcium channel blockers, Norvasc)
30 Classification Classification of concept pairs was performed using a single SVM classifier (Bradycardia, Norvasc) Features SVM Even pairs that could not form a valid relation were use for training TrIP TrWP TrCP TrAP TrNAP PIP TeRP TeCP No Relation (0.025 weight)
31 Features Five categories of information used for features: Features using words between the concepts Single-concept features Concept types of nearby concepts Wikipedia-based features Contextual similarity to training concept pairs
32 Contextual Features From the words between two concepts: String of the word Part of speech Concept type (if applicable) Phrase of all the words Does the phrase represent a conjunction? Sequence of phrase chunk types (GENIA) If there are intervening concepts: The relations that exists between those concepts
33 Example Patient developed [intermittent low - grade temperatures] prob with no [obvious etiology] prob ; [Tm] test 1/20 of 38 [Tm] test 1/ Features for (intermittent low-, Tm) Words: with, no, obvious, etiology, ; Concepts between: problem POS: IN, DT, JJ, NN, ; POS sequence: IN_DT_problem_: Relations between: PIP
34 Single-Concept Features String of the concept WordNet lemma General Inquirer positive/negative polarity Token before concept 3 tokens after concept Associated predicates extracted through a PropBank parse Alternative concept type pairs for both arguments
35 Example She was given [Zofran] treat for [some nausea] prob as well as [metoclopramide] treat p.r.n. Lemma1: zofran Lemma2: some nausea before1: given after1: for, some, nausea before2: for after2: as, well, as AssociatedPredicates1: given AlternativePair1: NONE_treatment AlternativePair2: problem_treatment
36 Wikipedia Features The idea: Map concepts to Wikipedia through exact page name match Features Determine if the pages link to each other Determine the depth of LCS for the two pages within the category hierarchy Top-level categories Medical tests -> Test Diseases and disorders -> Problem Medical treatments -> Treatment
37 Example He was started on [heparin] treat and he subsequently had [significant thrombocytopenia] prob with [platelets] test of 70,000. Thrombocytopenia (or paenia, or thrombopenia in short) is the presence of relatively few [platelets] in [blood]... platelets is called [throbocytopathy], which could be either a low number of platelets ([thrombocytopenia]), Platelet Transfusion medicine Clinical pathology (Test) Thrombocytopenia Clinical pathology
38 Inexact Matching Features Based on Edit Distance (Levenshtein) Used as a distance measure for k-nearest Neighbors During training a KNN classifier is trained on all but one document and used for that document During testing a KNN classifier is used which was trained on all training documents
39 Results # P R F1 Exact Span Span and relation type TrIP TrWP TrCP TrAP TrNAP PIP TeRP TeCP
40 Conclusions NLP techniques worked well on this data Could perform better if trained on medical text Large training data set may have reduced contribution of medical ontologies Future work shall take into account more knowledge mining Crowd-sourced resources such as Wikipedia still provide some valuable information
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
More informationIntegrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
More informationAn Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology
An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology Jon Patrick, Yefeng Wang and Peter Budd School of Information Technologies University of Sydney New South Wales 2006,
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationFind the signal in the noise
Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationTMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes
TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public
More informationSoftware Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationIdentify Disorders in Health Records using Conditional Random Fields and Metamap
Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian
More informationA flexible framework for deriving assertions from electronic medical records
A flexible framework for deriving assertions from electronic medical records Kirk Roberts, Sanda M Harabagiu < Additional materials are published online only. To view these files please visit the journal
More informationBuilding a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa
More informationElectronic Health Record (EHR) Standards Survey
Electronic Health Record (EHR) Standards Survey Compiled by: Simona Cohen, Amnon Shabo Date: August 1st, 2001 This report is a short survey about the main emerging standards that relate to EHR - Electronic
More informationCENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
More informationSNOMED CT. The Language of Electronic Health Records
SNOMED CT The Language of Electronic Health Records Contents SNOMED CT: An overview page 02 What is a Clinical Terminology? What is SNOMED CT? The International Health Terminology Standards Development
More informationA Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
More informationText Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
More informationNatural Language Processing in the EHR Lifecycle
Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP
More information3M Health Information Systems
3M Health Information Systems 1 Data Governance Disparate Systems Interoperability Information Exchange Reporting Public Health Quality Metrics Research Data Warehousing Data Standards What is the 3M Healthcare
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationThe What, When, Where and How of Natural Language Processing
The What, When, Where and How of Natural Language Processing There s a mystique that surrounds natural language processing (NLP) technology, regarding how it works, and what it can and cannot do. Although
More informationWorkshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationInterface Terminology to Facilitate the Problem List Using SNOMED CT and other Terminology Standards
Interface Terminology to Facilitate the Problem List Using SNOMED CT and other Terminology Standards Kshitij Saxena MD, MHSA Regional Medical Director, Adventist Health System Agenda Introduction Problem
More informationAutomated Problem List Generation from Electronic Medical Records in IBM Watson
Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationTREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/
More informationIBM Watson and Medical Records Text Analytics HIMSS Presentation
IBM Watson and Medical Records Text Analytics HIMSS Presentation Thomas Giles, IBM Industry Solutions - Healthcare Randall Wilcox, IBM Industry Solutions - Emerging Technology jstart The Next Grand Challenge
More informationKing Mongkut s University of Technology North Bangkok 4 Division of Business Computing, Faculty of Management Science
(IJCSIS) International Journal of Computer Science and Information Security, Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis
More informationProblem-Centered Care Delivery
HOW INTERFACE TERMINOLOGY MAKES STANDARDIZED HEALTH INFORMATION POSSIBLE Terminologies ensure that the languages of medicine can be understood by both humans and machines. by June Bronnert, RHIA, CCS,
More informationSecondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012
Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012 Christopher G. Chute, MD DrPH, Professor, Biomedical Informatics, Mayo Clinic Chair, ISO TC215 on Health Informatics Chair, International
More informationMeaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse
Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse Executive summary A large academic research institution uses the 3M Healthcare
More informationUsing Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance
Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,
More informationDeveloping VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record
Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale
More informationEvaluation of Negation Phrases in Narrative Clinical Reports
Evaluation of Negation Phrases in Narrative Clinical Reports Wendy W. Chapman PhD 1, Will Bridewell BS 2, Paul Hanbury BS 1, Gregory F. Cooper MD PhD 1,2, and Bruce G. Buchanan PhD 1,2 1 Center for Biomedical
More informationBig Data and Text Mining
Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist Ian.lewin@linguamatics.com www.linguamatics.com About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile,
More informationPerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang
PerCuro-A Semantic Approach to Drug Discovery Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang Towards the fulfillment of the course Semantic Web CSCI 8350 Fall 2003 Under
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationFree Text Phrase Encoding and Information Extraction from Medical Notes. Jennifer Shu
Free Text Phrase Encoding and Information Extraction from Medical Notes by Jennifer Shu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements
More informationMeaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use
Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use Executive summary By using standard terminologies to report on core
More informationNatural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
More informationHealth Care Information System Standards
Health Care Information System Standards 1 Standards Development Process Four Methods (Hammond & Cimino, 2001) Ad hoc no formal adoption process De facto vendor or other has a very large segment of the
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationDomain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle
More informationA Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract
More informationPatient Similarity-guided Decision Support
Patient Similarity-guided Decision Support Tanveer Syeda-Mahmood, PhD IBM Almaden Research Center May 2014 2014 IBM Corporation What is clinical decision support? Rule-based expert systems curated by people,
More informationHPI in-memory-based database system in Task 2b of BioASQ
CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture
More informationThe Big Picture: IDNT in Electronic Records Glossary
TERM DEFINITION CCI Canada Health Infoway Canadian Institute for Health Information EHR EMR EPR H L 7 (HL7) Canadian Classification of Interventions is the Canadian standard for classifying health care
More informationBridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
More informationQuestion Answering and Multilingual CLEF 2008
Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We
More informationStandardized Terminologies Used in the Learning Health System
Standardized Terminologies Used in the Learning Health System Judith J. Warren, PhD, RN, BC, FAAN, FACMI Christine A. Hartley Centennial Professor University of Kansas School of Nursing 1 Learning Objectives
More informationSEMANTIC DATA PLATFORM FOR HEALTHCARE. Dr. Philipp Daumke
SEMANTIC DATA PLATFORM FOR HEALTHCARE Dr. Philipp Daumke ABOUT AVERBIS Founded: 2007 Location: Focus: Languages: Current Sectors: Freiburg, Germany Terminology Management, Text Mining, Search multilingual
More informationSense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging
More informationSemantic Issues in Integrating Data from Different Models to Achieve Data Interoperability
Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability Rahil Qamar a, Alan Rector a a Medical Informatics Group, University of Manchester, Manchester, U.K. Abstract
More informationWeb-Based Genomic Information Integration with Gene Ontology
Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic
More informationDiscover more, discover faster. High performance, flexible NLP-based text mining for life sciences
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge
More informationNatural Language Processing Supporting Clinical Decision Support
Natural Language Processing Supporting Clinical Decision Support Applications for Enhancing Clinical Decision Making NIH Worksop; Bethesda, MD, April 24, 2012 Stephane M. Meystre, MD, PhD Department of
More informationAhmed AlBarrak PhD Medical Informatics Associate Professor, Family & Community Med. Chairman, Medical Informatics Department College of Medicine King
Ahmed AlBarrak PhD Medical Informatics Associate Professor, Family & Community Med. Chairman, Medical Informatics Department College of Medicine King Saud University albarrak@ksu.edu.sa What are Medical
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationParadigm Changes Affecting the Practice of Scientific Communication in the Life Sciences
Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationThe American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology
The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology H. Dunbar Hoskins, Jr., M.D., P. Lloyd Hildebrand, M.D., Flora Lum, M.D. The road towards broad adoption of electronic
More informationExtraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationTITLE Dori Whittaker, Director of Solutions Management, M*Modal
TITLE Dori Whittaker, Director of Solutions Management, M*Modal Challenges Impacting Clinical Documentation HITECH Act, Meaningful Use EHR mandate and adoption Need for cost savings Migration to ICD 10
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationNatural Language Processing for Bioinformatics: The Time is Ripe
Natural Language Processing for Bioinformatics: The Time is Ripe Jeffrey T. Chang Soumya Raychaudhuri is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University.
More information11-792 Software Engineering EMR Project Report
11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of
More informationezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes
ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes Parth Pathak, Pinal Patel, Vishal Panchal, Narayan Choudhary, Amrish Patel, Gautam Joshi ezdi, LLC.
More informationProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
More informationLABERINTO at ImageCLEF 2011 Medical Image Retrieval Task
LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task Jacinto Mata, Mariano Crespo, Manuel J. Maña Dpto. de Tecnologías de la Información. Universidad de Huelva Ctra. Huelva - Palos de la Frontera s/n.
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationAHIMA Curriculum Map Health Information Management Baccalaureate Degree Approved by AHIMA Education Strategy Committee February 2011
HIM Baccalaureate Degree Entry Level Competencies (Student Learning Outcomes) I. Domain: Health Data Management I. A. Subdomain: Health Data Structure, Content and Standards 1. Manage health data (such
More informationHow to stop looking in the wrong place? Use PubMed!
How to stop looking in the wrong place? Use PubMed! 1 Why not just use? Plus s Fast! Easy to remember web address Its huge - you always find something It includes PubMed citations Downside Is simply finding
More informationAnnotating Medical Forms using UMLS
Annotating Medical Forms using UMLS Victor Christen 1, Anika Groß 1, Julian Varghese 2, Martin Dugas 2, Erhard Rahm 1 1 Department of Computer Science, Universität Leipzig, Germany 2 Institute of Medical
More informationAtigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy
Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC
More informationComputer-assisted coding and natural language processing
Computer-assisted coding and natural language processing Without changes to current coding technology and processes, ICD-10 adoption will be very difficult for providers to absorb, due to the added complexity
More informationKnowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization
Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging
More informationAutomatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014
Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3
More informationAsk your Database: Natural Language Processing using In-Memory Technology
Enterprise Platform and Integration Concepts Master Project Summer Term 2015 Ask your Database: Natural Language Processing using In-Memory Technology Dr. Mariana Neves April 10th, 2015 Question Answering
More informationMeaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange
Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange Understanding health information exchanges (HIEs) What is the goal of
More informationMetrics for assessing the quality of value sets in clinical quality measures
AMIA Annu Symp Proc 2013:1497-1505. Metrics for assessing the quality of value sets in clinical quality measures Abstract Rainer Winnenburg, PhD, Olivier Bodenreider, MD, PhD National Library of Medicine,
More informationOverview of the TACITUS Project
Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for
More informationSurvey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationChallenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help
Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help Sujan Perera Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State
More informationInformation Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports
Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports W. Scott Campbell, Ph.D., MBA James R. Campbell, MD Acknowledgements Steven H. Hinrichs, MD Chairman
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationCore research questions for natural language processing of clinical text. Noémie Elhadad noemie@dbmi.columbia.edu
Core research questions for natural language processing of clinical text Noémie Elhadad noemie@dbmi.columbia.edu NLP s promise for medicine and health } Increasingly large amounts of texts } Clinical literature
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationMining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
More informationEfficient Data Integration in Finding Ailment-Treatment Relation
IJCST Vo l. 3, Is s u e 3, Ju l y - Se p t 2012 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Efficient Data Integration in Finding Ailment-Treatment Relation 1 A. Nageswara Rao, 2 G. Venu Gopal,
More informationRecognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine
Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine Son Doan and Hua Xu Department of Biomedical Informatics School of Medicine, Vanderbilt University Son.Doan@Vanderbilt.edu,
More informationGet the most value from your surveys with text analysis
PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That
More informationOverview of MT techniques. Malek Boualem (FT)
Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,
More informationlife science data mining
life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.
More informationProjektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
More information