Chapter 4 Getting the Data prof.dr.ir. Wil van der Aalst www.processmining.org
Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 1
Goal of process mining What really happened in the past? Why did it happen? What is likely to happen in the future? When and why do organizations and people deviate? How to control a process better? How to redesign a process to improve its performance? PAGE 2
Getting the data world business processes people machines components organizations models analyzes supports/ controls specifies configures implements analyzes software system records events, e.g., messages, transactions, etc. (process) model discovery conformance enhancement event logs PAGE 3
From heterogeneous data sources to process mining results Extract, Transform, and Load (ELT) optional data source ELT data source data source ELT ELT data warehouse coarse-grained scoping data source data source extract XES, MXML, or similar unfiltered event logs process mining discovery conformance enhancement filter filtered event logs (process) models answers fine-grained scoping PAGE 4
Example log A process consists of cases. A case consists of events such that each event relates to precisely one case. Events within a case are ordered. Events can have attributes. Examples of typical attribute names are activity, time, costs, and resource. PAGE 5
Another view process cases events 1 35654423 35654424... 35654427 attributes activity= register request time = 30-12-2010:11.02 resource = Pete costs = 50... activity= reject request time = 07-01-2011:14.24 resource = Pete costs = 200 2 35654483 35654485... 35654489 activity= register request time = 30-12-2010:11.32 resource = Mike costs = 50... activity= pay compensation time = 08-01-2011:12.05 resource = Ellen costs = 200 3 35654521 activity= register request time = 30-12-2010:11.32 resource = Pete costs = 50 35654522...... 35654533 activity= pay compensation time = 15-01-2011:10.45 resource = Ellen costs = 200......... PAGE 6
Standard transactional life-cycle model schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 7
Five activity instances a: schedule assign start complete d: start complete b: c: schedule assign reassign start suspend resume complete start suspend resume suspend abort_activity e: complete schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 8
Overlapping activity instances a: start start complete complete start start complete complete a: 6 hours 2 hours 5 hours 9 hours schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 9
Using attributes social network showing how work flows from one person to another Pete Sara Sue Mike performance indicators per activity Ellen Sean Activity b Frequency: 456 Waiting time: 15.6 +/- 2.5 hours Service time: 1.2 +/- 0.5 hours Costs: 412 +/- 55 euros start A a register request control flow c1 c2 E b examine thoroughly A c examine casually A d check ticket c3 c4 role M e decide f M c5 reinitiate request A g pay compensation A h reject request Activity g Frequency: 311 Waiting time: 12.4 +/- 2.1 hours Service time: 0.5 +/- 0.2 hours Costs: 198 +/- 35 euros end Activity h Frequency: 407 Waiting time: 7.4 +/- 1.8 hours Service time: 1.1 +/- 0.3 hours Costs: 209 +/- 38 euros PAGE 10
XES (extensible Event Stream) See www.xes-standard.org. Adopted by the IEEE Task Force on Process Mining. Predecessor: MXML and SA-MXML. The format is supported by tools such as ProM (as of version 6), Nitro, XESame, and OpenXES. ProMimport supports MXML. PAGE 11
PAGE 12
XES (compatible with MXML) Event log consists of: traces (process instances) events Standard extensions: concept (for naming) lifecycle (for transactional properties) org (for the organizational perspective) time (for timestamps) semantic (for ontology references) PAGE 13
extensions loaded every trace has a name every event has a name and a transition start of trace (i.e. process instance) classifier = name + transition name of trace resource timestamp transition name of event (activity name) PAGE 14
end of trace (i.e. process instance) start of trace name of trace resource timestamp name of event (activity name) data associated to event PAGE 15
Challenges when extracting event logs Correlation: Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other. Timestamps: Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging. Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g., a case was started before the beginning of the event log. Scoping. How to decide which tables to incorporate? Granularity: the events in the event log are at a different level of granularity than the activities relevant for end users. PAGE 16
Flattening reality into event logs Orderline Order OrderID : OrderID 1 1..* OrderLineID : OrderLineID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt DelID : DelID 0..* 1 Delivery DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 17
Tables Orderline Order OrderID : OrderID 1 1..* OrderLineID : OrderLineID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt DelID : DelID When : DateTime Successful : Bool 0..* 1 Delivery DelID : DelID DelAddress : Address Contact : PhoneNo PAGE 18
Order:91245 Order instance Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 OrderLine:112345 Order OrderID : OrderID 1 1..* Orderline OrderLineID : OrderLineID OrderID : OrderID Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Customer : CustID Product : ProdID OrderLine:112346 Amount : Euro Created : DateTime Paid : DateTime Completed : DateTime NofItems : PosInt TotalWeight : Weight Entered : DateTime BackOrdered : DateTime Activity: enter order line Timestamp: 28-11-2011:08.14 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: create backorder Timestamp: 28-11-2011:08.55 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: secure order line Timestamp: 30-11-2011:09.06 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Secured : DateTime DelID : DelID 1..* 0..1 OrderLine:112347 Activity: enter order line Timestamp: 28-11-2011:08.15 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Attempt DelID : DelID 0..* 1 Delivery DelID : DelID Delivery:882345 When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 19
Order:91245 Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 OrderLine:112345 Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 OrderLine:112346 Activity: enter order line Timestamp: 28-11-2011:08.14 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: create backorder Timestamp: 28-11-2011:08.55 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: secure order line Timestamp: 30-11-2011:09.06 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 OrderLine:112347 Activity: enter order line Timestamp: 28-11-2011:08.15 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 20
Orderline Orderline instance Order OrderID : OrderID Customer : CustID Amount : Euro 1 1..* OrderLineID : OrderLineID OrderID : OrderID Product : ProdID NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* OrderLine:112345 Case id: 112345 Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Case id: 112345 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Attempt DelID : DelID When : DateTime Successful : Bool 0..* 1 0..1 Delivery DelID : DelID DelAddress : Address Contact : PhoneNo Order:91245 Case id: 112345 Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Case id: 112345 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Case id: 112345 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 112345 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Case id: 112345 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Case id: 112345 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 PAGE 21
Other examples The life cycles of reviewers, authors, papers, reviews, PC chairs, etc. The life cycles of job applications and vacancies. X-ray machine logs: machine, machine day, patient, treatment, routine, etc.? Therefore, the selection and scoping of instances is needed. Like making deciding on the elements to be put on map; there may be many maps covering partially overlapping areas. PAGE 22
Extracting event logs Not just a syntactical issue. Different views are possible. Important: Selecting the right instance notion. Ordering of events. Selection of events. Proclets: the true fabric of real-life processes. PAGE 23