Sanda Harabagiu. The University of Texas at Dallas Human Language Technology Research Institute

Size: px
Start display at page:

Download "Sanda Harabagiu. The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu"

Transcription

1 Linking Information Extracted from Electronic Medical Records to Structured Knowledge Sanda Harabagiu The University of Texas at Dallas

2 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned

3 Ontological Resources: UMLS Unified Medical Language System (UMLS) consists of 1. a semantic network of biomedical semantic concepts and semantic relations that span them; hindlimb 2. a metathesaurus which encodes terms and codes from many vocabularies, including CPT, ICD-10-CM, LOINC, MeSH, RxNorm, and SNOMED CT leg region lower extremity hind limb 3. SPECIALIST Lexicon and Lexical Tools. We used UMLS to expand topic keywords into phrases encoded in the UMLS Metathesaurus which share the same CONCEPT ID. leg lower leg This primarily provides high confidence Keyword synonyms.

4 Resources: Clinical Ontologies The Systemized Nomenclature of Medicine Clinical Terms (SNOMED CT) is the most comprehensive, multilingual clinical healthcare terminology in the world. SNOMED CT is owned, maintained and distributed by the International Health Terminology Standards Development Organization (IHTSDO). SNOMED CT consists of four primary core components: 1. Concept Codes - numerical codes that identify clinical terms, primitive or defined, organized in hierarchies 2. Descriptions - textual descriptions of Concept Codes 3. Relationships - relationships between Concept Codes that have a related meaning 4. Reference Sets - used to group Concepts or Descriptions into sets, including reference sets and cross-maps to other classifications and standards. We utilize this relationship knowledge to expand a keyword so that it captures any phrase that partakes in the child-side of an IS-A, PART-OF or COMPONENT relationship. This allows us to expand hypernyms and meronyms. clozaril clozapine abilify atypical antipsychotic asenapine aripirazole

5 The Problem Ontologies provide machine-readable descriptions of biomedical concepts and their relations. Linking domain-specific terms expressed in clinical texts to their ontological encodings provides a platform for semantic interpretation of the clinical narratives. Knowledge extracted from clinical documents can be curated and used to update the content of biomedical ontologies.

6 The difficulties Principal link between clinical or biomedical texts and an ontology is a terminology, which aims to map concepts to terms. A term is a textual realization of a concept, e.g. disease, gene, protein. The problems: term variation and term ambiguity. Terms have a context they may have assertions associated with them Relations between terms exist differently that relations between concepts

7 Term Variation Term variation originates from the ability of a natural language to express a single concept in a number of ways. For example, in biomedicine there are many synonyms for proteins, enzymes, genes, etc Having six or seven synonyms for a single concept is not unusual in this domain. The probability of two experts using the same term to refer to the same concept is less than 20 per cent. In addition, biomedicine includes pharmacology, where numerous trademark names refer to the same compound (eg Advil, Brufen, Motrin, Nuprin and Nurofen all refer to ibuprofen).

8 Term ambiguity Bad News!!! :Term ambiguity occurs when the same term is used to refer to multiple concepts. Ambiguity is an inherent feature of natural language. Words typically have multiple dictionary entries and the meaning of a word can be altered by its context. Some Good News: Sublanguages, as the languages confined to specialized domains, provide a context which generally reduces the level of ambiguity. More Bad News!!! However, biomedicine encompasses a plethora of subdomains, which is an additional cause for the high level of ambiguity in biomedical terminology. For example, the term promoter refers to a binding site in a DNA chain at which RNA polymerase binds to initiate transcription of messenger RNA by one or more nearby structural genes in biology, while in chemistry it denotes a substance that in very small amounts is able to increase the activity of a catalyst. In addition, acronyms are extensively used in biomedicine (a new acronym is introduced in every five to ten abstracts in Medline) and they are known to be highly ambiguous (.80 per cent of acronyms are ambiguous, the average number of possible interpretations being 15).

9 More on ambiguity Acronym expansion: For example, AR could be expanded to any of the following terms: 1. Androgen Receptor, 2. AmphiRegulin, 3. Acyclic Retinoid, 4. Agonist Receptor, 5. Adrenergic Receptor Origins of ambiguity: text is not the only origin of ambiguity in biomedicine. Ambiguity is inherent to the field, because the evolution of species gave rise to many homologues and analogues. For instance, NFKB2 denotes a family of two individual proteins with separate identifiers in Swiss-Prot. These proteins are homologues belonging to different species, human and chicken

10 Pipeline of annotations Each natural language processing layer enhances the knowledge representation with machine readable information. Different forms of ambiguity are solved in the process: Lexical, syntactic, semantic, pragmatics Additional benefits: Joint learning and extraction of concepts and relations among them Learning how to represent context!!!!

11 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned

12 Details on Concept Extraction Based on our experiments with the 2010 i2b2 Challenge data Extracting concepts involved two decisions Boundary classification: Identify the first and last words of each concept Type classification: Is the concept a problem, test, or treatment? Discharge summaries contain numerous fields (zones), some of which are semi-structured (dates, dosages, etc), others which are un-structured ( prose ) Finding: Both have concepts

13 The Data The 2010 i2b2 challenge data consists of 826 discharge summaries and progress notes, split into 349 training and 477 testing documents. The documents are annotated by medical professionals familiar with their use. The data contains 72,846 medical concepts (27k train, 45k test). Each concept is classified as: 1. a problem (e.g., disease, injury), 2. test (e.g., diagnostic procedure, lab test), or 3. treatment (e.g., drug, preventative procedure, medical device). Medical problems are assigned an assertion type (belief status) among: present, absent, possible, hypothetical, conditional, or associated with someone else. The distribution of assertion types is far from uniform: 69% of all problems are considered present, 20% absent, less than 5% for possible and hypothetical, and less than 1% for conditional and associated with someone else. Additionally, the data contains a third set of annotations, relations between concepts

14 Concept Extraction Architecture New Resources: Wikipedia and WordNet Advanced Semantic Processing Lexical, syntactic and semantic disambiguation Terms exhibit a high degree of variation, which is not always explicitly reflected in biomedical ontologies. For this reason, the UMLS ontology is distributed together with computational support for neutralisation of variation in the biomedical domain. MetaMap is a highly configurable program developed by Dr. Alan Aronson at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text.

15 Example Quantitative Extractions Type Example Age patient is 79 years old. Date diagnosed on april with DiseaseID CHRONIC RENAL FAILURE ( ICD-9-CM 585 ) Dosage 5. Colace 100 milligrams po bid. List Element 5. Colace 100 milligrams po bid. Measurement Weight is 82 kilograms. Name Percent Time Electronically Signed by **NAME[YYY ZZZ] Birth weight was 3.29 kilograms in the 75th percentile FRI :04 PM

16 Concept Extraction Preprocessing: Rule-based detection of measurements, dosages, & other entities Boundary Extraction: Heuristic separates prose from non-prose text. Then two Conditional Random Field (CRF) classifiers are used to extract concepts (one from prose, one for non-prose) Concept Type: problem, test, or treatment Support Vector Machine (SVM) classifier performs 3-way classification

17 Concept Extraction Resources used: MetaMap/UMLS GENIA (chunk, POS) WordNet lemmas Quantitative types Results: Semantic parsing Wikipedia Various word features Affix features P R F1 Exact Boundary Exact Boundary + Type Inexact Boundary Inexact Boundary + Type

18 Feature Set 2 (1/2) CONTENT WORD (cw): lexicalized feature that selects an informative word from the constituent, other than the head. Selection heuristics available in the paper. E.g. June for the phrase in last June. PART OF SPEECH OF CONTENT WORD (cpos): part of speech tag of the content word. E.g. NNP for the phrase in last June. PART OF SPEECH OF HEAD WORD (hpos): part of speech tag of the head word. E.g. NN for the phrase the futures halt. NAMED ENTITY CLASS OF CONTENT WORD (cne): The class of the named entity that includes the content word. 7 named entity classes (from the MUC-7 specification) covered. E.g. DATE for in last June s treatment. 18

19 Feature Set 2 (2/2) BOOLEAN NAMED ENTITY FLAGS: set of features that indicate if a named entity is included at any position in the phrase: nediseaseid: set to true if an disease name is recognized in the phrase. nedosage: set to true if a dosage is recognized in the phrase. neperson: set to true if a person name is recognized in the phrase. nelist: set to true if a list expression is recognized in the phrase. nepercent: set to true if a percentage expression is recognized in the phrase. neage: set to true if a time of day expression is recognized in the phrase. nedate: set to true if a date temporal expression is recognized in the phrase. PHRASAL VERB COLLOCATIONS: set of two features that capture information about phrasal verbs: pvcsum: the frequency with which a verb is immediately followed by any preposition or particle. pvcmax: the frequency with which a verb is followed by its predominant preposition or particle. 19

20 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned

21 Assertion Classification Determining the belief status of a medical problem is a combination of Prior probability for the problem Detection of context clues (words, predicates, section names) SVM classifier performed 6-way classification Present Absent Hypothetical Conditional Possible Associated with someone else

22 Architecture of assertion classification system We use a NegEx feature to indicate the negation word associated with the medical problem. This allows the classifier to decide whether or not the negation word is useful and what assertion type it reflects. Additional medical features indicate if the problem was found in UMLS or MetaMap as the distribution of assertion types for problems found within these resources differs from that of the documents. We use the General Inquirer s categorical information to better understand the context of a medical problem. We only use the If category, which indicates uncertainty words such as unexpected, hesitant, or suspicious.

23 Assertion Classification Resources used: Semantic Parsing NegEx General Inquirer Stemmed previous words Section Name Results: # P R F1 Present Absent Possible Hypothetical Conditional Assoc. w. someone else Overall 92.7

24 Outline of the talk 1. The Problem 2. Extracting medical concepts 3. Identifying assertions in clinical texts 4. Relation Extraction 5. Lessons Learned

25 Relation Identification Relations can be present between any two concepts in a sentence We disallow relations between concepts with more than 9 intervening concepts Our Approach Form pairs of concepts from the sentence Classify each pair as having one of the relation types, or no relation

26 Relation Types 1. TrIP: A certain treatment has improved or cured a medical problem (e.g., infection resolved with antibiotic course ); 2. TrWP: A patient s medical problem has deteriorated or worsened because of or in spite of a treatment being administered (e.g., the tumor was growing despite the drain ); 3. TrCP: A treatment caused a medical problem (e.g., penicillin causes a rash ); 4. TrAP: A treatment administered for a medical problem (e.g., Dexamphetamine for narcolepsy ); 5. TrNAP: The administration of a treatment was avoided because of a medical problem (e.g., Ralafen which is contra-indicated because of ulcers ); 6. TeRP: A test has revealed some medical problem (e.g., an echocardiogram revealed a pericardial effusion ). 7. TeCP: A test was performed to investigate a medical problem (e.g., chest x-ray done to rule out pneumonia ); and 8. PIP: Two problems are related to each (e.g., Azotemia presumed secondary to sepsis ).

27 Strategy for extracting relations from electronic medical records The problem of relation discovery was cast as a multiclass classification problem. The classifier not only decides whether there is a relation between a pair of medical concepts, but it also decides the relation s type. To be able to make such decisions the classification system is trained on 349 documents comprising 5,264 relations.

28 Extraction of Medical Relations The multi-class classifier was implemented by using a Support Vector Machine (SVM) implementation called LibLINEAR [5]. This software is an extension of LibSVM [6] restricted to a linear kernel to achieve significant speed gains. LibLINEAR allows users to specify the importance of each class through a weighting mechanism. In this way, we could specify that no relation should be given less weight. A frequent class tends to bias SVM decisions toward that class improving accuracy, but possibly hurting F1 measure. Several weight values for the no relation class were tested by cross validation on the training set. The value which led to the best score was 0.025, a heavy discounting factor compared to the default of 1.0. Similarly, cross validation on the training set achieved the best results when the regularization parameter, C, was set to 0.5 and the termination parameter, epsilon, was set to 0.5.

29 Example [Bradycardia] prob is resolved after [beta blockers] treat and [calcium channel blockers] treat were stopped and [Norvasc] treat was started. The following pairs are formed from this sentence: (Bradycardia, beta blockers) [TrCP] (Bradycardia, calcium channel blockers) [TrCP] (Bradycardia, Norvasc) [TrIP] (beta blockers, calcium channel blockers) (beta blockers, Norvasc) (calcium channel blockers, Norvasc)

30 Classification Classification of concept pairs was performed using a single SVM classifier (Bradycardia, Norvasc) Features SVM Even pairs that could not form a valid relation were use for training TrIP TrWP TrCP TrAP TrNAP PIP TeRP TeCP No Relation (0.025 weight)

31 Features Five categories of information used for features: Features using words between the concepts Single-concept features Concept types of nearby concepts Wikipedia-based features Contextual similarity to training concept pairs

32 Contextual Features From the words between two concepts: String of the word Part of speech Concept type (if applicable) Phrase of all the words Does the phrase represent a conjunction? Sequence of phrase chunk types (GENIA) If there are intervening concepts: The relations that exists between those concepts

33 Example Patient developed [intermittent low - grade temperatures] prob with no [obvious etiology] prob ; [Tm] test 1/20 of 38 [Tm] test 1/ Features for (intermittent low-, Tm) Words: with, no, obvious, etiology, ; Concepts between: problem POS: IN, DT, JJ, NN, ; POS sequence: IN_DT_problem_: Relations between: PIP

34 Single-Concept Features String of the concept WordNet lemma General Inquirer positive/negative polarity Token before concept 3 tokens after concept Associated predicates extracted through a PropBank parse Alternative concept type pairs for both arguments

35 Example She was given [Zofran] treat for [some nausea] prob as well as [metoclopramide] treat p.r.n. Lemma1: zofran Lemma2: some nausea before1: given after1: for, some, nausea before2: for after2: as, well, as AssociatedPredicates1: given AlternativePair1: NONE_treatment AlternativePair2: problem_treatment

36 Wikipedia Features The idea: Map concepts to Wikipedia through exact page name match Features Determine if the pages link to each other Determine the depth of LCS for the two pages within the category hierarchy Top-level categories Medical tests -> Test Diseases and disorders -> Problem Medical treatments -> Treatment

37 Example He was started on [heparin] treat and he subsequently had [significant thrombocytopenia] prob with [platelets] test of 70,000. Thrombocytopenia (or paenia, or thrombopenia in short) is the presence of relatively few [platelets] in [blood]... platelets is called [throbocytopathy], which could be either a low number of platelets ([thrombocytopenia]), Platelet Transfusion medicine Clinical pathology (Test) Thrombocytopenia Clinical pathology

38 Inexact Matching Features Based on Edit Distance (Levenshtein) Used as a distance measure for k-nearest Neighbors During training a KNN classifier is trained on all but one document and used for that document During testing a KNN classifier is used which was trained on all training documents

39 Results # P R F1 Exact Span Span and relation type TrIP TrWP TrCP TrAP TrNAP PIP TeRP TeCP

40 Conclusions NLP techniques worked well on this data Could perform better if trained on medical text Large training data set may have reduced contribution of medical ontologies Future work shall take into account more knowledge mining Crowd-sourced resources such as Wikipedia still provide some valuable information

Travis Goodwin & Sanda Harabagiu

Travis Goodwin & Sanda Harabagiu Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research

More information

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,

More information

An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology

An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology Jon Patrick, Yefeng Wang and Peter Budd School of Information Technologies University of Sydney New South Wales 2006,

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes Jitendra Jonnagaddala a,b,c Siaw-Teng Liaw *,a Pradeep Ray b Manish Kumar c School of Public

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Identify Disorders in Health Records using Conditional Random Fields and Metamap

Identify Disorders in Health Records using Conditional Random Fields and Metamap Identify Disorders in Health Records using Conditional Random Fields and Metamap AEHRC at ShARe/CLEF 2013 ehealth Evaluation Lab Task 1 G. Zuccon 1, A. Holloway 1,2, B. Koopman 1,2, A. Nguyen 1 1 The Australian

More information

A flexible framework for deriving assertions from electronic medical records

A flexible framework for deriving assertions from electronic medical records A flexible framework for deriving assertions from electronic medical records Kirk Roberts, Sanda M Harabagiu < Additional materials are published online only. To view these files please visit the journal

More information

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1, José Carlos Cortizo 1,2, José María Gómez 3 1 Universidad Europea de Madrid, C/Tajo s/n, Villaviciosa

More information

Electronic Health Record (EHR) Standards Survey

Electronic Health Record (EHR) Standards Survey Electronic Health Record (EHR) Standards Survey Compiled by: Simona Cohen, Amnon Shabo Date: August 1st, 2001 This report is a short survey about the main emerging standards that relate to EHR - Electronic

More information

CENG 734 Advanced Topics in Bioinformatics

CENG 734 Advanced Topics in Bioinformatics CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the

More information

SNOMED CT. The Language of Electronic Health Records

SNOMED CT. The Language of Electronic Health Records SNOMED CT The Language of Electronic Health Records Contents SNOMED CT: An overview page 02 What is a Clinical Terminology? What is SNOMED CT? The International Health Terminology Standards Development

More information

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,

More information

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk

Text Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,

More information

Natural Language Processing in the EHR Lifecycle

Natural Language Processing in the EHR Lifecycle Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline Medical Data Landscape Value Proposition of NLP

More information

3M Health Information Systems

3M Health Information Systems 3M Health Information Systems 1 Data Governance Disparate Systems Interoperability Information Exchange Reporting Public Health Quality Metrics Research Data Warehousing Data Standards What is the 3M Healthcare

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

The What, When, Where and How of Natural Language Processing

The What, When, Where and How of Natural Language Processing The What, When, Where and How of Natural Language Processing There s a mystique that surrounds natural language processing (NLP) technology, regarding how it works, and what it can and cannot do. Although

More information

Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science

Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Interface Terminology to Facilitate the Problem List Using SNOMED CT and other Terminology Standards

Interface Terminology to Facilitate the Problem List Using SNOMED CT and other Terminology Standards Interface Terminology to Facilitate the Problem List Using SNOMED CT and other Terminology Standards Kshitij Saxena MD, MHSA Regional Medical Director, Adventist Health System Agenda Introduction Problem

More information

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Automated Problem List Generation from Electronic Medical Records in IBM Watson Proceedings of the Twenty-Seventh Conference on Innovative Applications of Artificial Intelligence Automated Problem List Generation from Electronic Medical Records in IBM Watson Murthy Devarakonda, Ching-Huei

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

IBM Watson and Medical Records Text Analytics HIMSS Presentation

IBM Watson and Medical Records Text Analytics HIMSS Presentation IBM Watson and Medical Records Text Analytics HIMSS Presentation Thomas Giles, IBM Industry Solutions - Healthcare Randall Wilcox, IBM Industry Solutions - Emerging Technology jstart The Next Grand Challenge

More information

King Mongkut s University of Technology North Bangkok 4 Division of Business Computing, Faculty of Management Science

King Mongkut s University of Technology North Bangkok 4 Division of Business Computing, Faculty of Management Science (IJCSIS) International Journal of Computer Science and Information Security, Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis

More information

Problem-Centered Care Delivery

Problem-Centered Care Delivery HOW INTERFACE TERMINOLOGY MAKES STANDARDIZED HEALTH INFORMATION POSSIBLE Terminologies ensure that the languages of medicine can be understood by both humans and machines. by June Bronnert, RHIA, CCS,

More information

Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012

Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012 Secondary Use of EMR Data View from SHARPn AMIA Health Policy, 12 Dec 2012 Christopher G. Chute, MD DrPH, Professor, Biomedical Informatics, Mayo Clinic Chair, ISO TC215 on Health Informatics Chair, International

More information

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary (HDD): Implemented with a data warehouse Executive summary A large academic research institution uses the 3M Healthcare

More information

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance

Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance Using Knowledge Extraction and Maintenance Techniques To Enhance Analytical Performance David Bixler, Dan Moldovan and Abraham Fowler Language Computer Corporation 1701 N. Collins Blvd #2000 Richardson,

More information

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record

Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Developing VA GDx: An Informatics Platform to Capture and Integrate Genetic Diagnostic Testing Data into the VA Electronic Medical Record Scott L. DuVall Jun 27, 2014 1 Julie Lynch Vickie Venne Dawn Provenzale

More information

Evaluation of Negation Phrases in Narrative Clinical Reports

Evaluation of Negation Phrases in Narrative Clinical Reports Evaluation of Negation Phrases in Narrative Clinical Reports Wendy W. Chapman PhD 1, Will Bridewell BS 2, Paul Hanbury BS 1, Gregory F. Cooper MD PhD 1,2, and Bruce G. Buchanan PhD 1,2 1 Center for Biomedical

More information

Big Data and Text Mining

Big Data and Text Mining Big Data and Text Mining Dr. Ian Lewin Senior NLP Resource Specialist Ian.lewin@linguamatics.com www.linguamatics.com About Linguamatics Boston, USA Cambridge, UK Software Consulting Hosted content Agile,

More information

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang

PerCuro-A Semantic Approach to Drug Discovery. Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang PerCuro-A Semantic Approach to Drug Discovery Final Project Report submitted by Meenakshi Nagarajan Karthik Gomadam Hongyu Yang Towards the fulfillment of the course Semantic Web CSCI 8350 Fall 2003 Under

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Free Text Phrase Encoding and Information Extraction from Medical Notes. Jennifer Shu

Free Text Phrase Encoding and Information Extraction from Medical Notes. Jennifer Shu Free Text Phrase Encoding and Information Extraction from Medical Notes by Jennifer Shu Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements

More information

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Standardizing lab data to LOINC for meaningful use Executive summary By using standard terminologies to report on core

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Health Care Information System Standards

Health Care Information System Standards Health Care Information System Standards 1 Standards Development Process Four Methods (Hammond & Cimino, 2001) Ad hoc no formal adoption process De facto vendor or other has a very large segment of the

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle

More information

A Method for Automatic De-identification of Medical Records

A Method for Automatic De-identification of Medical Records A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA tafvizi@csail.mit.edu Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA mpacula@csail.mit.edu Abstract

More information

Patient Similarity-guided Decision Support

Patient Similarity-guided Decision Support Patient Similarity-guided Decision Support Tanveer Syeda-Mahmood, PhD IBM Almaden Research Center May 2014 2014 IBM Corporation What is clinical decision support? Rule-based expert systems curated by people,

More information

HPI in-memory-based database system in Task 2b of BioASQ

HPI in-memory-based database system in Task 2b of BioASQ CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture

More information

The Big Picture: IDNT in Electronic Records Glossary

The Big Picture: IDNT in Electronic Records Glossary TERM DEFINITION CCI Canada Health Infoway Canadian Institute for Health Information EHR EMR EPR H L 7 (HL7) Canadian Classification of Interventions is the Canadian standard for classifying health care

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

Standardized Terminologies Used in the Learning Health System

Standardized Terminologies Used in the Learning Health System Standardized Terminologies Used in the Learning Health System Judith J. Warren, PhD, RN, BC, FAAN, FACMI Christine A. Hartley Centennial Professor University of Kansas School of Nursing 1 Learning Objectives

More information

SEMANTIC DATA PLATFORM FOR HEALTHCARE. Dr. Philipp Daumke

SEMANTIC DATA PLATFORM FOR HEALTHCARE. Dr. Philipp Daumke SEMANTIC DATA PLATFORM FOR HEALTHCARE Dr. Philipp Daumke ABOUT AVERBIS Founded: 2007 Location: Focus: Languages: Current Sectors: Freiburg, Germany Terminology Management, Text Mining, Search multilingual

More information

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania htd@linc.cis.upenn.edu October 30, 2003 Outline English sense-tagging

More information

Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability

Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability Semantic Issues in Integrating Data from Different Models to Achieve Data Interoperability Rahil Qamar a, Alan Rector a a Medical Informatics Group, University of Manchester, Manchester, U.K. Abstract

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge

More information

Natural Language Processing Supporting Clinical Decision Support

Natural Language Processing Supporting Clinical Decision Support Natural Language Processing Supporting Clinical Decision Support Applications for Enhancing Clinical Decision Making NIH Worksop; Bethesda, MD, April 24, 2012 Stephane M. Meystre, MD, PhD Department of

More information

Ahmed AlBarrak PhD Medical Informatics Associate Professor, Family & Community Med. Chairman, Medical Informatics Department College of Medicine King

Ahmed AlBarrak PhD Medical Informatics Associate Professor, Family & Community Med. Chairman, Medical Informatics Department College of Medicine King Ahmed AlBarrak PhD Medical Informatics Associate Professor, Family & Community Med. Chairman, Medical Informatics Department College of Medicine King Saud University albarrak@ksu.edu.sa What are Medical

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences Prof. Dr. Martin Hofmann-Apitius Head of the Department of Bioinformatics Fraunhofer Institute for Algorithms and

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology

The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology The American Academy of Ophthalmology Adopts SNOMED CT as its Official Clinical Terminology H. Dunbar Hoskins, Jr., M.D., P. Lloyd Hildebrand, M.D., Flora Lum, M.D. The road towards broad adoption of electronic

More information

Extraction and Visualization of Protein-Protein Interactions from PubMed

Extraction and Visualization of Protein-Protein Interactions from PubMed Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

TITLE Dori Whittaker, Director of Solutions Management, M*Modal

TITLE Dori Whittaker, Director of Solutions Management, M*Modal TITLE Dori Whittaker, Director of Solutions Management, M*Modal Challenges Impacting Clinical Documentation HITECH Act, Meaningful Use EHR mandate and adoption Need for cost savings Migration to ICD 10

More information

Customizing an English-Korean Machine Translation System for Patent Translation *

Customizing an English-Korean Machine Translation System for Patent Translation * Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Natural Language Processing for Bioinformatics: The Time is Ripe

Natural Language Processing for Bioinformatics: The Time is Ripe Natural Language Processing for Bioinformatics: The Time is Ripe Jeffrey T. Chang Soumya Raychaudhuri is a Ph.D. candidate in the Russ Altman lab in the Biomedical Informatics program at Stanford University.

More information

11-792 Software Engineering EMR Project Report

11-792 Software Engineering EMR Project Report 11-792 Software Engineering EMR Project Report Team Members Phani Gadde Anika Gupta Ting-Hao (Kenneth) Huang Chetan Thayur Suyoun Kim Vision Our aim is to build an intelligent system which is capable of

More information

ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes

ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes ezdi: A Hybrid CRF and SVM based Model for Detecting and Encoding Disorder Mentions in Clinical Notes Parth Pathak, Pinal Patel, Vishal Panchal, Narayan Choudhary, Amrish Patel, Gautam Joshi ezdi, LLC.

More information

ProteinQuest user guide

ProteinQuest user guide ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for

More information

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task

LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task LABERINTO at ImageCLEF 2011 Medical Image Retrieval Task Jacinto Mata, Mariano Crespo, Manuel J. Maña Dpto. de Tecnologías de la Información. Universidad de Huelva Ctra. Huelva - Palos de la Frontera s/n.

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

AHIMA Curriculum Map Health Information Management Baccalaureate Degree Approved by AHIMA Education Strategy Committee February 2011

AHIMA Curriculum Map Health Information Management Baccalaureate Degree Approved by AHIMA Education Strategy Committee February 2011 HIM Baccalaureate Degree Entry Level Competencies (Student Learning Outcomes) I. Domain: Health Data Management I. A. Subdomain: Health Data Structure, Content and Standards 1. Manage health data (such

More information

How to stop looking in the wrong place? Use PubMed!

How to stop looking in the wrong place? Use PubMed! How to stop looking in the wrong place? Use PubMed! 1 Why not just use? Plus s Fast! Easy to remember web address Its huge - you always find something It includes PubMed citations Downside Is simply finding

More information

Annotating Medical Forms using UMLS

Annotating Medical Forms using UMLS Annotating Medical Forms using UMLS Victor Christen 1, Anika Groß 1, Julian Varghese 2, Martin Dugas 2, Erhard Rahm 1 1 Department of Computer Science, Universität Leipzig, Germany 2 Institute of Medical

More information

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy

Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Atigeo at TREC 2012 Medical Records Track: ICD-9 Code Description Injection to Enhance Electronic Medical Record Search Accuracy Bryan Tinsley, Alex Thomas, Joseph F. McCarthy, Mike Lazarus Atigeo, LLC

More information

Computer-assisted coding and natural language processing

Computer-assisted coding and natural language processing Computer-assisted coding and natural language processing Without changes to current coding technology and processes, ICD-10 adoption will be very difficult for providers to absorb, due to the added complexity

More information

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging

More information

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3

More information

Ask your Database: Natural Language Processing using In-Memory Technology

Ask your Database: Natural Language Processing using In-Memory Technology Enterprise Platform and Integration Concepts Master Project Summer Term 2015 Ask your Database: Natural Language Processing using In-Memory Technology Dr. Mariana Neves April 10th, 2015 Question Answering

More information

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange

Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange Meaningful use. Meaningful data. Meaningful care. The 3M Healthcare Data Dictionary: Enabling effective health information exchange Understanding health information exchanges (HIEs) What is the goal of

More information

Metrics for assessing the quality of value sets in clinical quality measures

Metrics for assessing the quality of value sets in clinical quality measures AMIA Annu Symp Proc 2013:1497-1505. Metrics for assessing the quality of value sets in clinical quality measures Abstract Rainer Winnenburg, PhD, Olivier Bodenreider, MD, PhD National Library of Medicine,

More information

Overview of the TACITUS Project

Overview of the TACITUS Project Overview of the TACITUS Project Jerry R. Hobbs Artificial Intelligence Center SRI International 1 Aims of the Project The specific aim of the TACITUS project is to develop interpretation processes for

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help

Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help Sujan Perera Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State

More information

Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports

Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports W. Scott Campbell, Ph.D., MBA James R. Campbell, MD Acknowledgements Steven H. Hinrichs, MD Chairman

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Core research questions for natural language processing of clinical text. Noémie Elhadad noemie@dbmi.columbia.edu

Core research questions for natural language processing of clinical text. Noémie Elhadad noemie@dbmi.columbia.edu Core research questions for natural language processing of clinical text Noémie Elhadad noemie@dbmi.columbia.edu NLP s promise for medicine and health } Increasingly large amounts of texts } Clinical literature

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

Efficient Data Integration in Finding Ailment-Treatment Relation

Efficient Data Integration in Finding Ailment-Treatment Relation IJCST Vo l. 3, Is s u e 3, Ju l y - Se p t 2012 ISSN : 0976-8491 (Online) ISSN : 2229-4333 (Print) Efficient Data Integration in Finding Ailment-Treatment Relation 1 A. Nageswara Rao, 2 G. Venu Gopal,

More information

Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine

Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine Son Doan and Hua Xu Department of Biomedical Informatics School of Medicine, Vanderbilt University Son.Doan@Vanderbilt.edu,

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

life science data mining

life science data mining life science data mining - '.)'-. < } ti» (>.:>,u» c ~'editors Stephen Wong Harvard Medical School, USA Chung-Sheng Li /BM Thomas J Watson Research Center World Scientific NEW JERSEY LONDON SINGAPORE.

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information