Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative & Annual Neil Barrett PhD, PhD, Vincent Thai MD Workshop
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 2 Outline 1. Quick intro to NLP - and challenges in health care 2. Application problem: Sentinel event extraction 3. Solution method: selected Engineering bits 4. Evaluation method - and results 5. Conclusions and future work
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 3 Bridging Natural Language and Structure NLP structured coded computable free-form natural language human-readable
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 4 Challenges of NLP for Health Info Clinical narrative is often ungrammatical Training examples (corpora) are scarce General NLP solutions often perform poorly Engineering NLP solutions for particular problems in health care still more art than science (expensive) Lack of systematic guidance (blueprint recipe)
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 5 Our overall objective Research and document a systematic method ( blueprint ) for engineering NLP solutions for extracting codified information from clinical narrative.
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 6 Case Study: Extracting Sentinel Events from Palliative Consult Letters SENTINEL EVENT: an unexpected occurrence involving death or serious physical or psychological injury, or the risk thereof 3www.jointcommission.org/SentinelEvents
7
8
9
10
11
12
13
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 14 Kinds of Sentinel Events Considered Dyspnea [yes/no] Dyspnea at rest [yes/no] Delirium [yes/no] Brain metastases (leptomeningeal) [yes/no] Sepsis [yes/no] Infection [yes/no] Infection site [urinary tract /intra-abdominal/skin] Chest infection, aspiration related [yes/no] IV antibiotic use [yes/no] IV antibiotic use response [no/partial/complete] Oral antibiotic use [yes/no] Oral antibiotic use response [no/partial/complete] Serum creatinine [integer, date] Dysphagia [yes/no] Previous VTE [yes/no] VTE [yes/no] ICU Stay [yes/no] ICU length of stay in days [integer]
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 15 General NLP System Blueprint
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 16 Accuracy of segmentation/tokenization/pos tagging important for overall accuracy of extraction
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 17 New Approach to feed back tagger output into tokenization process Token-Lattice representation of a phrase s segmentation Resulting POS-tagger can be trained with general language corpora - but performs on par with highly trained biomedical taggers.
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 18 Automated Clinical Coding using Snomed CT Malt [Nivre06] or MST [McDonald05]
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 19 Automated Clinical Coding Method Step 1: Encode the Sentinel Events of interest into Snomed CT:
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 20 Automated Clinical Coding textual description of SCT Sentinel Event concepts (tokens) match Input text (tokens)
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 21 Automated Clinical Coding Method Step 2: Tokenize and normalize of the NL description(s) of each encoded concept. (e.g., fractures is normalized to fracture ), written numbers (e.g., two and II become 2 ), and abbreviations (e.g., HIV ). Step 3: Pinpoint semantic atoms to concepts where they are first introduced in the SCT poly-hierarchy Step 4: Perform token-level coding. Map each token in the input stream of clinical narrative to the set of SCT concepts where that token appears in the associated semantic atoms set. Step 5: Combine multiple tokens into valid SCT precoordinated and post-coordinated expressions, using the syntactic structure and the POS tags of the input text. This step is done by implementing SCTs rules on constructing valid expressions. Step 6: Select the most general SCT concept. Multiple concepts may have been mapped to a given linguistic structure.
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 22 Classification (yes/no) using a trained SVM classifier, supplemented with direct (pattern-based) extraction, e.g., for dates & values
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 23 Evaluation Method
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 24 Results
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 25 Discussion: information gap variation of software performance correlates with gap size
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 26 Conclusion Blueprint validated (albeit with limited data) It should be possible to rapidly construct NLP information extractors for similar problems cheaply following the blueprint / method Formal study of engineering effort (cost / benefit) planned as future work
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 27 Acknowledgements Thanks to Francis Lau and Dennis Lee for their help with the Snomed CT encoding Funding from the Natural Science and Engineering Research Council of Canada Thanks to the University of Alberta Hospital for supporting this research
Extracting Sentinel Events from Consult Letters, Medinfo, Aug. 22, 2013 28 References S. Buchholz and E. Marsi. Conll-x shared task on multilingual dependency parsing. 10th Conf. on Computational Natural Language Learning, p. 149 164. ACL, 2006 J. Nivre, J. Hall, J. Nilsson, G.Eryigit, and S. Marinov. Labeled pseudo-projective dependency parsing with support vector machines. Proc. 10th Conf. on Computational Natural Language Learning, p. 221 225. ACL, 2006. N. Barrett, J. H. Weber-Jahnke and V. Thai. Automated Clinical Coding using Semantic Atoms and Topology. 25th IEEE CBMS. June 20-22, Rome Italy, 2012 N. Barrett and J. H. Weber-Jahnke. Building a Biomedical Tokenizer Using the Token Lattice Design Pattern and the Adapted Viterbi Algorithm. BMC Bioinformatics 2011, 12(Suppl 3):S1doi:10.1186/1471-2105-12-S3-S1 R. McDonald, F. Pereira, K. Ribarov, and J. Hajiˇc. Non- projective dependency parsing using spanning tree algorithms. Proc. of Conf. on Human Language Technology and Empirical Methods in NLP, p. 523 530. ACL, 2005.