Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science

Similar documents
Automatic Detection and Correction of Errors in Dependency Treebanks

Natural Language to Relational Query by Using Parsing Compiler

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

Semantic annotation of requirements for automatic UML class diagram generation

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Financial Trading System using Combination of Textual and Numerical Data

Text Mining: The state of the art and the challenges

Identifying Focus, Techniques and Domain of Scientific Papers

Bayesian Spam Filtering

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

A Method for Automatic De-identification of Medical Records

Annotation and Evaluation of Swedish Multiword Named Entities

An integrated. EHR system

Machine Learning for natural language processing

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Segmentation and Classification of Online Chats

Technical Report. The KNIME Text Processing Feature:

Blog Post Extraction Using Title Finding

A Systematic Cross-Comparison of Sequence Classifiers

The Italian Hate Map:

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Automated Problem List Generation from Electronic Medical Records in IBM Watson

Customizing an English-Korean Machine Translation System for Patent Translation *

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

A Survey on Product Aspect Ranking Techniques

Building gold-standard treebanks for Norwegian

Simple Type-Level Unsupervised POS Tagging

DEPENDENCY PARSING JOAKIM NIVRE

Research Portfolio. Beáta B. Megyesi January 8, 2007

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

Search and Information Retrieval

Recognizing and Encoding Disorder Concepts in Clinical Text using Machine Learning and Vector Space Model *

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network

Transition-Based Dependency Parsing with Long Distance Collocations

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Protein-protein Interaction Passage Extraction Using the Interaction Pattern Kernel Approach for the BioCreative 2015 BioC Track

A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks

Sentiment analysis on tweets in a financial domain

ecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

Spam Detection Using Customized SimHash Function

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Programming Languages

Shallow Parsing with Apache UIMA

Special Topics in Computer Science

Extraction of Radiology Reports using Text mining

BANNER: AN EXECUTABLE SURVEY OF ADVANCES IN BIOMEDICAL NAMED ENTITY RECOGNITION

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1

Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition

TMUNSW: Identification of disorders and normalization to SNOMED-CT terminology in unstructured clinical notes

Integrating Annotation Tools into UIMA for Interoperability

Web Document Clustering

Terminology Extraction from Log Files

Building a Question Classifier for a TREC-Style Question Answering System

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Identify Disorders in Health Records using Conditional Random Fields and Metamap

Processing: current projects and research at the IXA Group

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Facilitating Business Process Discovery using Analysis

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

Information Model Requirements of Post-Coordinated SNOMED CT Expressions for Structured Pathology Reports

Information Extraction from Patents: Combining Text- and Image-Mining. Martin Hofmann-Apitius

Predicting the Stock Market with News Articles

Outline of today s lecture

Natural Language Processing

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Semantic parsing with Structured SVM Ensemble Classification Models

Delay-Aware Big Data Collection Strategies for Sensor Cloud Services Dr. Chi-Tsun (Ben), Cheng

Using Text Mining and Natural Language Processing for Health Care Claims Processing

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Mining a Corpus of Job Ads

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

Automatic Text Analysis Using Drupal

S-Sense: A Sentiment Analysis Framework for Social Media Sensing

How to Conduct a Thorough CAC Readiness Assessment

Wiki-ly Supervised Part-of-Speech Tagging

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Terminology Extraction from Log Files

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes

Extracting Events from Web Documents for Social Media Monitoring using Structured SVM

SVM Based Learning System For Information Extraction

Outline. Introduction. State-of-the-art Forensic Methods. Hardware-based Workload Forensics. Experimental Results. Summary. OS level Hypervisor level

Transcription:

Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative & Annual Neil Barrett PhD, PhD, Vincent Thai MD Workshop

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 2 Outline 1. Quick intro to NLP - and challenges in health care 2. Application problem: Sentinel event extraction 3. Solution method: selected Engineering bits 4. Evaluation method - and results 5. Conclusions and future work

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 3 Bridging Natural Language and Structure NLP structured coded computable free-form natural language human-readable

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 4 Challenges of NLP for Health Info Clinical narrative is often ungrammatical Training examples (corpora) are scarce General NLP solutions often perform poorly Engineering NLP solutions for particular problems in health care still more art than science (expensive) Lack of systematic guidance (blueprint recipe)

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 5 Our overall objective Research and document a systematic method ( blueprint ) for engineering NLP solutions for extracting codified information from clinical narrative.

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 6 Case Study: Extracting Sentinel Events from Palliative Consult Letters SENTINEL EVENT: an unexpected occurrence involving death or serious physical or psychological injury, or the risk thereof 3www.jointcommission.org/SentinelEvents

7

8

9

10

11

12

13

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 14 Kinds of Sentinel Events Considered Dyspnea [yes/no] Dyspnea at rest [yes/no] Delirium [yes/no] Brain metastases (leptomeningeal) [yes/no] Sepsis [yes/no] Infection [yes/no] Infection site [urinary tract /intra-abdominal/skin] Chest infection, aspiration related [yes/no] IV antibiotic use [yes/no] IV antibiotic use response [no/partial/complete] Oral antibiotic use [yes/no] Oral antibiotic use response [no/partial/complete] Serum creatinine [integer, date] Dysphagia [yes/no] Previous VTE [yes/no] VTE [yes/no] ICU Stay [yes/no] ICU length of stay in days [integer]

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 15 General NLP System Blueprint

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 16 Accuracy of segmentation/tokenization/pos tagging important for overall accuracy of extraction

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 17 New Approach to feed back tagger output into tokenization process Token-Lattice representation of a phrase s segmentation Resulting POS-tagger can be trained with general language corpora - but performs on par with highly trained biomedical taggers.

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 18 Automated Clinical Coding using Snomed CT Malt [Nivre06] or MST [McDonald05]

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 19 Automated Clinical Coding Method Step 1: Encode the Sentinel Events of interest into Snomed CT:

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 20 Automated Clinical Coding textual description of SCT Sentinel Event concepts (tokens) match Input text (tokens)

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 21 Automated Clinical Coding Method Step 2: Tokenize and normalize of the NL description(s) of each encoded concept. (e.g., fractures is normalized to fracture ), written numbers (e.g., two and II become 2 ), and abbreviations (e.g., HIV ). Step 3: Pinpoint semantic atoms to concepts where they are first introduced in the SCT poly-hierarchy Step 4: Perform token-level coding. Map each token in the input stream of clinical narrative to the set of SCT concepts where that token appears in the associated semantic atoms set. Step 5: Combine multiple tokens into valid SCT precoordinated and post-coordinated expressions, using the syntactic structure and the POS tags of the input text. This step is done by implementing SCTs rules on constructing valid expressions. Step 6: Select the most general SCT concept. Multiple concepts may have been mapped to a given linguistic structure.

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 22 Classification (yes/no) using a trained SVM classifier, supplemented with direct (pattern-based) extraction, e.g., for dates & values

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 23 Evaluation Method

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 24 Results

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 25 Discussion: information gap variation of software performance correlates with gap size

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 26 Conclusion Blueprint validated (albeit with limited data) It should be possible to rapidly construct NLP information extractors for similar problems cheaply following the blueprint / method Formal study of engineering effort (cost / benefit) planned as future work

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 27 Acknowledgements Thanks to Francis Lau and Dennis Lee for their help with the Snomed CT encoding Funding from the Natural Science and Engineering Research Council of Canada Thanks to the University of Alberta Hospital for supporting this research

Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 28 References S. Buchholz and E. Marsi. Conll-x shared task on multilingual dependency parsing. 10th Conf. on Computational Natural Language Learning, p. 149 164. ACL, 2006 J. Nivre, J. Hall, J. Nilsson, G.Eryigit, and S. Marinov. Labeled pseudo-projective dependency parsing with support vector machines. Proc. 10th Conf. on Computational Natural Language Learning, p. 221 225. ACL, 2006. N. Barrett, J. H. Weber-Jahnke and V. Thai. Automated Clinical Coding using Semantic Atoms and Topology. 25th IEEE CBMS. June 20-22, Rome Italy, 2012 N. Barrett and J. H. Weber-Jahnke. Building a Biomedical Tokenizer Using the Token Lattice Design Pattern and the Adapted Viterbi Algorithm. BMC Bioinformatics 2011, 12(Suppl 3):S1doi:10.1186/1471-2105-12-S3-S1 R. McDonald, F. Pereira, K. Ribarov, and J. Hajiˇc. Non- projective dependency parsing using spanning tree algorithms. Proc. of Conf. on Human Language Technology and Empirical Methods in NLP, p. 523 530. ACL, 2005.