Investigating Clinical Care Pathways Correlated with Outcomes



Similar documents
Exploration and Visualization of Post-Market Data

Big Data Analytics for Healthcare

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Graph Mining and Social Network Analysis

Business Process Modeling

CHAPTER 1 INTRODUCTION

Social Media Mining. Data Mining Essentials

Chapter 12 Analyzing Spaghetti Processes

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Process Modelling from Insurance Event Log

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Summary and Outlook. Business Process Intelligence Course Lecture 8. prof.dr.ir. Wil van der Aalst.

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Web Document Clustering

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

Business Process Mining from E-commerce Web Logs

Dotted Chart and Control-Flow Analysis for a Loan Application Process

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Introduction to Data Mining

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

SPATIAL DATA CLASSIFICATION AND DATA MINING

Personalized Information Management for Web Intelligence

Association rules for improving website effectiveness: case analysis

Using Process Mining to Bridge the Gap between BI and BPM

How To Perform An Ensemble Analysis

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Mining Text Data: An Introduction

Data Mining Analytics for Business Intelligence and Decision Support

Modeling Guidelines Manual

EDIminer: A Toolset for Process Mining from EDI Messages

Protein Protein Interaction Networks

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010

Unsupervised Data Mining (Clustering)

Customer Analytics. Turn Big Data into Big Value

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Visualization methods for patent data

Big Data Analytics in Health Care

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Introduction. A. Bellaachia Page: 1

Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Process Mining by Measuring Process Block Similarity

A Business Process Services Portal

PLG: a Framework for the Generation of Business Process Models and their Execution Logs

FlowSpy: : exploring Activity-Execution Patterns from Business Processes

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter 20: Data Analysis

An Introduction to Data Mining

Healthcare Measurement Analysis Using Data mining Techniques

Trace Clustering in Process Mining

Data Mining On Diabetics

Data Mining Solutions for the Business Environment

Model Discovery from Motor Claim Process Using Process Mining Technique

2015 Workshops for Professors

not possible or was possible at a high cost for collecting the data.

Information Management course

Automated Process for Generating Digitised Maps through GPS Data Compression

Model-Based Cluster Analysis for Web Users Sessions

Implementing Heuristic Miner for Different Types of Event Logs

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

1. Classification problems

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

BPIC 2014: Insights from the Analysis of Rabobank Service Desk Processes

Big Data Analytics and Healthcare

Data Isn't Everything

Sanjeev Kumar. contribute

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Optimization of ETL Work Flow in Data Warehouse

A SURVEY ON WEB MINING TOOLS

Frequence: Interactive Mining and Visualization of Temporal Frequent Event Sequences

Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols

Why are Organizations Interested?

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

A Survey on Web Mining From Web Server Log

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

A Study of Web Log Analysis Using Clustering Techniques

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Database Marketing, Business Intelligence and Knowledge Discovery

THE growing volume and variety of data presents both

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Exploring Big Data in Social Networks

Transcription:

Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013

Outline Care Pathways Typical Challenges Related Work Overview of our Approach Input Data Pre-Processing Trace Clustering Same Day Concurrent Event Collapse Frequent Pattern Mining Process Mining Results Discussion 2

Background Identifying care pathways correlated with outcomes from patient event data is of vital importance for gaining the insights of which specific care pathway will lead to a good/bad outcome. Once identified, such care pathways could be used by medical boards for refining care plan descriptions for treating particular diseases such as congestive heart failure etc. 3

A Spaghetti like model of patient event data Process Mining, by Wil van der Aalst, Communications of the ACM, August 2012.

Challenges Raw patient event data from the real world suffers from a number of problems. These problems include: It is very common that many events happen on the same day. Identical events could occur consecutively for the same patient (within a short time period) causing loops and spaghetti like patterns in the patient data when it is mined using known process mining techniques. The diversity of events could be explosive. For example the patient may be given 10 different Antihypertensive medications. Thus the event pool now has 10 additional events even though they all could be classified as an antihypertensive drug. 5

Outline Why Pathway Mining is Important Typical Challenges Related Work Overview of our Approach Input Data Pre-Processing Trace Clustering Same Day Concurrent Event Collapse Frequent Pattern Mining Process Mining Results Discussion 6

Related Work in 3 Relevant Areas Process Mining for Clinical Pathway Mining HeuristicsMiner, social network analysis, and dotted chart analysis are used for obtaining insights into care flow data in [BIOSTEC 2008] Sequence clustering and process mining [Inf. Sys. 2012] Fuzzy mining and trace alignment for investigating clinical pathway data [BPM 2010] The applicability of various mining techniques to healthcare data by adopting both a department and treatment based focus [BPM Workshops 2011]. Control-flow discovery with the Fuzzy Miner and networked graph visualizations [PAKDD 2012]. The effectiveness of several process mining algorithms on Magnetic Resonance Imaging (MRI), ultrasound and X-ray appointments within a Radiology workflow [MIE 2008]. Hidden Markov Models in combination with process mining techniques [ICDM 2010]. Use of process mining and clustering to discover patient journeys [HIKM 2012]. A new process mining approach to discover temporal orders of medical [AI. Med. 2012]. Data mining techniques inspired by process mining are applied to detect time dependency patterns in clinical pathways in [JMI 2001]. Clustering process instances Frequent Pattern Mining 7

Related Work Process Mining for Clinical Pathway Mining Clustering process instances which could be applied to pathway mining Group process instances based on similar behavior [SDM 2009, BPM Workshops 2009] Organize, group and cluster process models based on similarities such as structure and contained data (organizations, activity names, etc) [BPM 2011] Evaluating different similarity measures for clustering process instances using a density based clustering algorithm for evaluation (DBSCAN) [EUROCAST 2009]. Processes are represented as weighted graphs and similarity measures are based on graph vectors that are estimated based on activity and transition frequencies [ICCSA 2006] Graph similarity based clustering [Journal of Computational Interdisciplinary Sciences, 2011] Hierarchical clustering of execution traces [PAKDD 2004] Optimizing mined process models using clustering [BPM Workshops 2012] Frequent Pattern Mining 8

Related Work Process Mining for Clinical Pathway Mining Clustering Frequent Pattern Mining Mine discriminative dyadic sequential patterns [EDBT 2011]. It cannot handle sequences with same-day concurrent events. A tree based algorithm for identifying discriminative patterns [ICDE 2008]. Does not use a pattern based representation for events, and cannot handle sequences with same-day concurrent events. A statistical approach for summarizing and visualizing temporal associations between the prescription of a drug and the occurrence of a medical event [SIGKDD 2008] Temporal pattern discovery is presented in [AMIA 2009] that requires a predefined temporal grammar and logic with prior knowledge. 9

Outline Why Pathway Mining is Important Typical Challenges Related Work Overview of our Approach Input Data Pre-Processing Trace Clustering Same Day Concurrent Event Collapse Frequent Pattern Mining Process Mining Results Discussion 10

Overview of Our Approach Business Process Insight: An Approach and Platform for the Discovery and Analysis of End-to-End Business Processes Szabolcs Rozsnyai, Geetika T. Lakshmanan, Vinod Muthusamy, Rania Khalaf and Matthew J. Duftler Service Research and Innovation Institute (SRII) Global Conference, pp. 80--89, 2012 11

Data Patient Electronic Medical Records EMR data from a major US Healthcare provider Events for Medications, Labs, Diagnosis, and Vital signs for 4096 patients diagnosed with Congestive Heart Failure (CHF) in Ambulatory care. Create a trace for each patient from this data. Each trace corresponds to a patient s data, containing event names, and event dates and event attributes. 12

Segmenting by Outcome Patient data can be segmented into positive outcomes and negative outcomes. Example of a Positive Outcome: Patients not hospitalized for CHF related causes within one year or more of CHF diagnosis Example of a Negative Outcome: Patients hospitalized for CHF related causes within one year of CHF diagnosis Guided by Dr. Robert Sorrentino, M.D. IBM T. J. Watson Research Center CHF = congestive heart failure

Data Pre-Processing Steps Preprocessing Steps: We first perform pre-processing on the data: 1. Filtering (replacing raw event names with hierarchical categorical names). Medication names replaced by Pharmacy Subclass_CHFLevel names Diagnoses names replaced by DXGroup Names, All non Congestive Heart Failure disease specific labs are filtered out. 2. Compact representation of consecutive repeat events: If event vital event occurs for the same patient consecutively, we denote these events as Vital-Repeat in this patient s event stream. 3. Same day concurrent event collapse

Trace Processing: Representing Consecutive Identical Events Consecutive identical events may suggests some routine check or periodical treatment and thus we can treat them similarly A A C C D E B B C A B consolidation RepC Rep AB D E C A B Consecutive identical events should be treated differently from single events 15

Trace Processing: Break Down Same-Day Concurrent Events Detecting the frequent clinical packages of the same-day concurrent events, and utilize those packages to break down the same-day concurrent events Two Way Sorting Package cardinality long short Package appearance frequency High A B C A C E A B A C B C A E C E C D C A B D E Low A B C D E A B C B D 16

Clustering Group process traces using DBScan on the basis of execution footprint. Convert each trace into a string (temporal occurance of events), and use Levenshtein distance metric to compute the distance between strings (not necessarily of the same length). 17

Clustering Group process traces using DBScan on the basis of execution footprint. Convert each trace into a string (temporal occurrence of events), and use Levenshtein distance metric to compute the distance between strings (not necessarily of the same length). EventType/Activity Mapped Character OrderReceived A ShipmentCreated B TransportStarted C TransportEnded D InvoiceIssued E...... T1 T1Map ped T2 T2Map ped OrderReceived ShipmentCreated TransportStarted TransportEnded InvoiceIssued ABCDE OrderReceived ShipmentCreated TransportStarted ShipmentCreated TransportStarted... ABCBC... 18

Pattern Processing: Constructing Bag-of-Pattern vectors Detect Frequent patterns using SPAM. Patterns are collected into a dictionary (size m), and we construct a BoP vector representation (m-dim vector) for each trace. After we obtain the vector representation, we compute the correlation between frequent patterns and outcomes. Pattern A B E D 2 A B C D A B B C D 3 1 2 B D B A B 2 2 Frequency 19

Pattern Processing: Hierarchical Summarization to Reduce Sparsity Pattern explosion Generally the pattern extractor will return a large number of patterns The constructed bag-of-patterns are very sparse (i.e. most patterns do not occur most of the time). If vectors are too sparse, computational models are not meaningful. In order to make vectors more compact, some master patterns can summarize less frequent patterns by applying hierarchical pattern summarization. Hierarchical pattern summarization Merge the detected pattern pairs in a hierarchical (or recursive) way If AB is in the detected pattern list, and BA is also in the detected pattern list, then we can consider merging AB and BA as one pattern A;B, meaning that we can neglect the order between them The bag-of-pattern vector of the resultant pattern after merging is equal to the sum of the individual patterns it merged from The algorithm will stop when there is nothing more to merge 20

Identify Patterns with High Correlation with Patient Outcomes Action Patterns N-dimensional patient outcome vector... a1 a2 a3 a4 am............... p1 p2 p3 p4 p5 Patients...... pn 21

Outcome Analysis Information Gain 22

Example of Detected Frequent Patterns Frequent Pattern Information Gain Diuretics3 Vital 0.142934165 AntianginalAgents4 Vital 0.127046503 Vital Beta Blockers2 0.108907065 Vital Cardtiotonics4 0.101768006 Digoxin Repeatvital Diuretics3 0.07780386 Repeatvital...... 23

Outline Why Pathway Mining is Important Typical Challenges Overview of our Approach Input Data Pre-Processing Trace Clustering Same Day Concurrent Event Collapse Frequent Pattern Mining Process Mining Results Discussion 24

Example of Finding a Dominant Cluster (Step 4 of the Method) Epsilon=2.75. The dominant cluster has 204 patient traces.

Example of a Mined Model of Patient Traces Events consist of: 1. Lab panels (Combinations of labs such as LabPanelA, LabPanelB, etc) 2. Medications: AntianginalAgents4 (means AntianginalAgents to treat CHF 4), BetaBlockers2, Diuretics3, etc. 3. Diagnoses: Heartfailure Each mined model has a Start and End to mark start and finish of aggregation of treatment pathways.

Example of a Mined Model of Patient Traces Pathway 1: Start AntianginalAgents4 Vital Cardiotonics4 LabPanelL Digoxin Magnesium NatureticPeptide HeartFailure Vital. Pathway 2: Vital Diuretics3 Vital as well as Vital Diuretics3 Vital LabPanelK Diuretics3 Vital. Pathway 3: Vital Potassium LabPanelC FO2HBArterial O2SArterial PCO2Arterial PHArterial POArterial PulseOx Vital.

Example of overlaying a Frequent Pattern on a Mined Model Cardiotonics4 Diuretics4 Vital Collapsed all labs into single event: LabTest Pattern correlated with positive patient outcomes. Frequent Pattern Information Gain Diuretics3 Vital 0.142934165 AntianginalAgents 4 Vital Vital Beta Blockers2 Vital Cardtiotonics4 Digoxin Repeatvital Diuretics3 Repeatvital 0.127046503 0.108907065 0.101768006 0.07780386

Outline Why Pathway Mining is Important Typical Challenges Overview of our Approach Input Data Pre-Processing Trace Clustering Same Day Concurrent Event Collapse Frequent Pattern Mining Process Mining Results Discussion 29

Discussion and Future Work Interesting combination of process mining and data mining techniques to extract insight from patient EMR data Aspects to address in future work: All frequent patterns could in worst case consist of single events. Frequent pattern generation algorithm governed by parameters. Similarly, HeuristicMiner has edge and node threshold frequency parameters. Parameter impact needs to be synchronized. While replacing raw event names with publicly available hierarchical category names helped eliminate redundancies in event names, it may have led to loss of useful event information. Although the mined model and overlaying frequent patterns were useful to an expert physician with experience in treating CHF patients, the utility of the toolset needs to be validated with a larger number of users. Need to validate on patient event data for other diseases. 30