Text Mining in Electronic Medical Records by Mark Greenwood
Medical Records care-plans diagnoses medical history prescriptions symptoms chronic/acute
Uses? Primary Record information Communication between health teams Secondary (collectively) Medical research Guide health public policy Guide optimization of health resources
Electronic Medical Records (EMR) database storage improves accessibility legibility quality enables automatic analysis
Unstructured vs. Structured 82F w with known h/o ischemic CMP( EF 25-30% ) CAD s/p stent; diastolic dysfunction who presents with worsening SOB X 6-8 weeks. She reports increased DOE, PND, orthopnea, denies LE swelling. She had increased SOB on the evening prior to admission... Age 82 Gender History Chief complaint F G343.00: Ischemic cardiomyothopy G340.12: Coronary artery disease 7929400: Insertion of coronary artery stent 14A6.00: H/O: Heart failure 1739.00: Shortness of breath Symptoms 173C.11: Dyspnoea on exertion 1736.00: Paroxysmal nocturnal dyspnoea 1735.11: Orthopnoea symptom 1831.00: No oedema present Diagnosis... Unstructured free-text Structured & coded
Controlled vocabularies (CVs) controlled vocabulary taxonomical agreed/standard terms definitions Read codes: 7-character code represents concept encodes position 1...00 - History / symptoms 17...00 - Respiratory symptoms 173..00 - Breathlessness 1731.00 - No breathlessness... 1739.00 - Shortness of breath 18...00 - Cardiovascular symptoms 2...00 - Examination / Signs
Ambiguity problems 82F w with known h/o ischemic CMP( EF 25-30% ) CAD s/p stent; diastolic dysfunction who presents with worsening SOB X 6-8 weeks. She reports increased Ambiguity: DOE, PND, orthopnea CMP =, denies competitive LE medical swelling. plan She had CMP increased = cartilage matrix SOB protein on the evening prior to admission... CMP = comprehensive medical plan CMP = comprehensive metabolic panel Age 82 Gender History Chief complaint F G343.00: Ischemic cardiomyothopy G340.12: Coronary artery disease 7929400: Insertion of coronary artery stent 14A6.00: H/O: Heart failure 1739.00: Shortness of breath Symptoms 173C.11: Dyspnoea on exertion 1736.00: Paroxysmal nocturnal dyspnoea 1735.11: Orthopnoea symptom 1831.00: No oedema present Diagnosis... Unstructured free-text Structured & coded
Variability problems 82F w with known h/o ischemic CMP( EF 25-30% ) CAD s/p stent; diastolic dysfunction who presents with worsening SOB X 6-8 weeks. She reports Variability: increased DOE, PND, orthopnea SOB =, denies shortness LE of breath swelling. She had SOB increased = breathlessness SOB on the evening prior to admission... SOB = dyspnea SOB = dyspnoea SOB = breathing difficulty Age 82 Gender History Chief complaint F G343.00: Ischemic cardiomyothopy G340.12: Coronary artery disease 7929400: Insertion of coronary artery stent 14A6.00: H/O: Heart failure 1739.00: Shortness of breath Symptoms 173C.11: Dyspnoea on exertion 1736.00: Paroxysmal nocturnal dyspnoea 1735.11: Orthopnoea symptom 1831.00: No oedema present Diagnosis... Unstructured free-text Structured & coded
Expressiveness vs. structure 82F w with known h/o ischemic CMP( EF 25-30% ) CAD s/p stent; diastolic dysfunction who presents with worsening SOB X 6-8 weeks. She reports increased DOE, PND, orthopnea, denies LE swelling. She had increased SOB on the evening prior to admission... Loss of information? Age 82 Cannot encode measurement (Ejection Fraction) Gender History Chief complaint F G343.00: Ischemic cardiomyothopy G340.12: Coronary artery disease 7929400: Insertion of coronary artery stent 14A6.00: H/O: Heart failure 1739.00: Shortness of breath Symptoms 173C.11: Dyspnoea on exertion Loss of information? 1736.00: Paroxysmal nocturnal dyspnoea Duration not covered in Read 1735.11: codes Orthopnoea symptom 1831.00: No oedema present Diagnosis... Unstructured free-text Structured & coded
Are both text & structured information required for full picture?
UK GP records the middle ground structured & coded + free-text + for data analysis + for recording & communication Date Code Text 11/11/90 G343.00: Ischemic cardiomyothopy 10/10/92 G340.12: Coronary artery disease 10/10/92 7929400: Insertion of coronary artery stent EF 25-30% 09/09/94 1739.00: Shortness of breath She reports increased DOE, PND, orthopnea, denies LE swelling. She had increased SOB on the evening prior to admission...
UK GP records the middle ground structured & coded + free-text + for data analysis? Date Code Text 11/11/90 G343.00: Ischemic cardiomyothopy 10/10/92 G340.12: Coronary artery disease 10/10/92 7929400: Insertion of coronary artery stent EF 25-30% 09/09/94 1739.00: Shortness of breath She reports increased DOE, PND, orthopnea, denies LE swelling. She had increased SOB on the evening prior to admission...
Can text mining of EMRs aid medical research?
Text Mining automatic knowledge discovery from unstructured text high throughput additional information impact on medical research?
Progress systematic literature review methodology data acquisition GP records medical question design topic: respiratory tract infections in children data exploration
Systematic Literature Review literature on text mining in medical records standardised queries strict inclusion/exclusion criteria 500 abstracts found ~200 included full paper review structured decomposition little work on UK GP records
Medical Sub-domain children with respiratory tract infections factors affecting complications antibiotic resistance antibiotic prescriptions down > chance of complications early indication?
Data Exploration identified cohort children w/ respiratory symptoms identified by relevant Read codes ~330k patients ~1.3M consultations what kind of information is recorded?
Data Exploration term cloud weighted by significance to respiratory symptoms (tf-idf like significance to Read code set)
Next Steps... develop/use software for: medical term recognition information extraction extracting patterns of URT complications evaluation design compare coded & free-text content compare data & text mining results