Workflow framework for mining Diagnostic rules from ehrs Roxana Danger Department of Computing Imperial College London KNIME User Day, University of Westminster, June 25th, 2013
Outline EU-FP7 TRANSFoRm project Universal repository of diagnostic rules Data type heterogeneity Definition of analysis goals Selection of algorithms and quality measures Selection of working environment Results presentation Summary 2
TRANSFoRm Translational Research & Patient Safety in Europe 3
Knowledge In Healthcare and TRANSFoRm role Specific Research Knowledge Produced from clinical trials From controlled populations With well-defined questions Routinely Collected Knowledge Actionable Knowledge A vast quantity of data Captured in ehr systems With large population coverage May lack in detail and quality Distilled scientific findings Usable in clinical practice To support decision making 4
TRANSFoRm aims and objectives TRANSFoRm will develop a digital infrastructure that facilitates the reuse of primary care real world electronic Health Records (ehr) data to improve both patient safety and the conduct and volume of Clinical Research in Europe. The project will drive the advanced integration of clinical practice and research data to: Support clinical research with participant identification and evaluation of outcomes Support epidemiological research with large scale phenotype-genotype association studies and follow-up on trials Support clinical care on diagnosis and monitoring of patients 5
TRANSFoRm Data mining ehr BDs Data Mining Preproc Dataset Mining MCERs Filtering Filtered MCERs DSS Repository Updating Manual reviewed, interpretation and creation of CPRs Initial set of CPRs Validation CPRs DSS Repository CPRs from bibliog. 6
MCERs Measured clinical evidence rules Derek, 2012 DemographicFeatures, RFEs, Symptoms, Signs, Riskfactors, Test performed Diagnosis(QM) 7
Data from GPRD, TRANSHIS, NIVEL 8
ehrs Data mining Data type heterogeneity CDIM Terminologies and their mappings EHR BD 1 DSM 1 C D I M CDIM - DSM 1 C D I M CDIM - Dataset 1 EHR BD 2 EHR BD 3 DSM 2 DSM 3 M a p p i n g CDIM - DSM 2 CDIM - DSM 3 Q u e r y CDIM - Dataset 2 CDIM - Dataset 3 9
Definition of analysis goals First consultation of an EoC DemographicFeatures, RFEs, Symptoms, Signs, Riskfactors Diagnosis(QM) Consequent consultations of an EoC DemographicFeatures, RFEs, Symptoms, Signs, Riskfactors, Test performed, Time from previous consultation Diagnosis(QM) (DemographicFeatures, RFEs, Symptoms, Signs, Riskfactors, Test performed, Time from previous consultation) (DemographicFeatures, RFEs, Symptoms, Signs, Riskfactors, Test performed, Time from previous consultation) Diagnosis(QM) 10
Algorithms and quality measures (1) Fast implementations, easily parallelizable Easy to understand outputs, from which MCERs can be easily extracted. Association rules (Apriori, Kingfisher) Decision trees (C4.5) Sequential patterns (Sequential Kingfisher) 11
Algorithms and quality measures (2) Part I: Consequent (disease) interest Prior probability Part III: Itemsets characterization Support Error rate Part II: Variables characterization Posterior probability (+/-) Likelihood ratio Odd Ratio Part IV: Rules characterization Lift Confidence Sensitivity Specificity Likelihood ratio Odd Ratio 12
KNIME workflow environment (1) Our KNIME extensions: Database nodes Filter rows Filter columns Rename Resort columns Derive Group by Join Data mining nodes Kingfisher Quantifiers 13
KNIME workflow environment (1) 14
KNIME workflow environment (2) Configuring 15
KNIME workflow environment (4) Model learners 16
KNIME workflow environment (4) Quality measures computing 17
Results presentation 156.17.131.215/RulesAssessment/#rulesViewer 18
KNIME workflow for text mining 19
Summary Universal repository of diagnostic rules Data type heterogeneity Definition of analysis goals Selection of algorithms and quality measures Selection of working environment Results presentation Workflows defined (first results are promising) Future work Provenance CDIM integration 20
Workflow framework for mining Diagnostic rules from ehrs Thanks to team of data mining task in TRANSFoRm: Derek Connigan (RSCI) Jean K. Soler (Synapse) Tomasz Kajdanowicz (WROC) Przemyslaw Kazienko (WROC) Vasa Curcin (IC) KNIME User Day, University of Westminster, June 25th, 2013