Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science

Size: px

Start display at page:

Download "Workshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science"

Denis Chapman
10 years ago
Views:

1 Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative & Annual Neil Barrett PhD, PhD, Vincent Thai MD Workshop

2 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Outline 1. Quick intro to NLP - and challenges in health care 2. Application problem: Sentinel event extraction 3. Solution method: selected Engineering bits 4. Evaluation method - and results 5. Conclusions and future work

3 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Bridging Natural Language and Structure NLP structured coded computable free-form natural language human-readable

4 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Challenges of NLP for Health Info Clinical narrative is often ungrammatical Training examples (corpora) are scarce General NLP solutions often perform poorly Engineering NLP solutions for particular problems in health care still more art than science (expensive) Lack of systematic guidance (blueprint recipe)

5 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Our overall objective Research and document a systematic method ( blueprint ) for engineering NLP solutions for extracting codified information from clinical narrative.

6 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Case Study: Extracting Sentinel Events from Palliative Consult Letters SENTINEL EVENT: an unexpected occurrence involving death or serious physical or psychological injury, or the risk thereof 3www.jointcommission.org/SentinelEvents

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Kinds of Sentinel Events Considered Dyspnea [yes/no] Dyspnea at rest [yes/no] Delirium [yes/no] Brain metastases (leptomeningeal) [yes/no] Sepsis [yes/no] Infection [yes/no] Infection site [urinary tract /intra-abdominal/skin] Chest infection, aspiration related [yes/no] IV antibiotic use [yes/no] IV antibiotic use response [no/partial/complete] Oral antibiotic use [yes/no] Oral antibiotic use response [no/partial/complete] Serum creatinine [integer, date] Dysphagia [yes/no] Previous VTE [yes/no] VTE [yes/no] ICU Stay [yes/no] ICU length of stay in days [integer]

15 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, General NLP System Blueprint

16 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Accuracy of segmentation/tokenization/pos tagging important for overall accuracy of extraction

17 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, New Approach to feed back tagger output into tokenization process Token-Lattice representation of a phrase s segmentation Resulting POS-tagger can be trained with general language corpora - but performs on par with highly trained biomedical taggers.

18 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Automated Clinical Coding using Snomed CT Malt [Nivre06] or MST [McDonald05]

19 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Automated Clinical Coding Method Step 1: Encode the Sentinel Events of interest into Snomed CT:

20 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Automated Clinical Coding textual description of SCT Sentinel Event concepts (tokens) match Input text (tokens)

21 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Automated Clinical Coding Method Step 2: Tokenize and normalize of the NL description(s) of each encoded concept. (e.g., fractures is normalized to fracture ), written numbers (e.g., two and II become 2 ), and abbreviations (e.g., HIV ). Step 3: Pinpoint semantic atoms to concepts where they are first introduced in the SCT poly-hierarchy Step 4: Perform token-level coding. Map each token in the input stream of clinical narrative to the set of SCT concepts where that token appears in the associated semantic atoms set. Step 5: Combine multiple tokens into valid SCT precoordinated and post-coordinated expressions, using the syntactic structure and the POS tags of the input text. This step is done by implementing SCTs rules on constructing valid expressions. Step 6: Select the most general SCT concept. Multiple concepts may have been mapped to a given linguistic structure.

22 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Classification (yes/no) using a trained SVM classifier, supplemented with direct (pattern-based) extraction, e.g., for dates & values

23 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Evaluation Method

24 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Results

25 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Discussion: information gap variation of software performance correlates with gap size

26 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Conclusion Blueprint validated (albeit with limited data) It should be possible to rapidly construct NLP information extractors for similar problems cheaply following the blueprint / method Formal study of engineering effort (cost / benefit) planned as future work

27 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, Acknowledgements Thanks to Francis Lau and Dennis Lee for their help with the Snomed CT encoding Funding from the Natural Science and Engineering Research Council of Canada Thanks to the University of Alberta Hospital for supporting this research

28 Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, References S. Buchholz and E. Marsi. Conll-x shared task on multilingual dependency parsing. 10th Conf. on Computational Natural Language Learning, p ACL, 2006 J. Nivre, J. Hall, J. Nilsson, G.Eryigit, and S. Marinov. Labeled pseudo-projective dependency parsing with support vector machines. Proc. 10th Conf. on Computational Natural Language Learning, p ACL, N. Barrett, J. H. Weber-Jahnke and V. Thai. Automated Clinical Coding using Semantic Atoms and Topology. 25th IEEE CBMS. June 20-22, Rome Italy, 2012 N. Barrett and J. H. Weber-Jahnke. Building a Biomedical Tokenizer Using the Token Lattice Design Pattern and the Adapted Viterbi Algorithm. BMC Bioinformatics 2011, 12(Suppl 3):S1doi: / S3-S1 R. McDonald, F. Pereira, K. Ribarov, and J. Hajiˇc. Non- projective dependency parsing using spanning tree algorithms. Proc. of Conf. on Human Language Technology and Empirical Methods in NLP, p ACL, 2005.

Automatic Detection and Correction of Errors in Dependency Treebanks

Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany [email protected] Günter Neumann DFKI Stuhlsatzenhausweg