Process Mining Data Science in Action Wil van der Aalst Scientific director of the DSC/e Dutch Data Science Summit, Eindhoven, 4-5-2014.
Process Mining Data Science in Action https://www.coursera.org/course/procmin
statistics data mining machine learning stochastics process mining databases algorithms data science large scale distributed computing industrial engineering visualization visual analytics behavioral/ social sciences privacy domain knowledge
statistics data mining machine learning stochastics process mining databases algorithms data science large scale distributed computing industrial engineering visualization visual analytics behavioral/ social sciences privacy domain knowledge
business process management business process reengineering process science statistics stochastics data mining machine learning process mining databases algorithms data science large scale distributed computing industrial engineering visualization visual analytics behavioral/ social sciences privacy domain knowledge model checking formal methods concurrency Petri nets BPMN
Internet of Events
Internet of Events: 4 sources of event data Internet of Events
Internet of Events: 4 sources of event data Internet of Content Big Data Internet of Events
Internet of Events: 4 sources of event data Internet of Content Internet of People Big Data social Internet of Events
Internet of Events: 4 sources of event data Internet of Content Internet of People Internet of Things Big Data social cloud Internet of Events
Internet of Events: 4 sources of event data Internet of Content Internet of People Internet of Things Internet of Places Big Data social cloud mobility Internet of Events
Starting point for process mining: Event data student name course name exam date mark Peter Jones Business Information systems 16-1-2014 8 Sandy Scott Business Information systems 16-1-2014 5 Bridget White Business Information systems 16-1-2014 9 John Anderson Business Information systems 16-1-2014 8 Sandy Scott BPM Systems 17-1-2014 7 Bridget White BPM Systems 17-1-2014 8 Sandy Scott Process Mining 20-1-2014 5 Bridget White Process Mining 20-1-2014 9 John Anderson Process Mining 20-1-2014 8 case id activity name timestamp other data every row is an event (here: an exam attempt)
Another event log: order handling order number activity timestamp user product quantity 9901 register order 22-1-2014@09.15 Sara Jones iphone5s 1 9902 register order 22-1-2014@09.18 Sara Jones iphone5s 2 9903 register order 22-1-2014@09.27 Sara Jones iphone4s 1 9901 check stock 22-1-2014@09.49 Pete Scott iphone5s 1 9901 ship order 22-1-2014@10.11 Sue Fox iphone5s 1 9903 check stock 22-1-2014@10.34 Pete Scott iphone4s 1 9901 handle payment 22-1-2014@10.41 Carol Hope iphone5s 1 9902 check stock 22-1-2014@10.57 Pete Scott iphone5s 2 9902 cancel order 22-1-2014@11.08 Carol Hope iphone5s 2 case id activity name timestamp resource other data
Another event log: patient treatment patient activity timestamp doctor age cost 5781 make X-ray 23-1-2014@10.30 Dr. Jones 45 70.00 5541 blood test 23-1-2014@10.18 Dr. Scott 61 40.00 5833 blood test 23-1-2014@10.27 Dr. Scott 24 40.00 5781 blood test 23-1-2014@10.49 Dr. Scott 45 40.00 5781 CT scan 23-1-2014@11.10 Dr. Fox 45 1200.00 5833 surgery 23-1-2014@12.34 Dr. Scott 24 2300.00 5781 handle payment 23-1-2014@12.41 Carol Hope 45 0.00 5541 radiation therapy 23-1-2014@13.57 Dr. Jones 61 140.00 5541 radiation therapy 23-1-2014@13.08 Dr. Jones 61 140.00 case id activity name timestamp resource other data
Let's play Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary Play-In Play-Out Replay start register travel request (a) get support from local manager (b) get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) end
Play-Out Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary get support from local manager (b) register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
Play Out: A possible scenario a b d e g XORsplit get support from local manager (b) XORjoin start register travel request (a) XORjoin ANDsplit get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) ANDjoin XORsplit XORjoin end Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary
Play Out: Another scenario get support from local manager (b) start register travel request (a) get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) end a d c e f b d e h
Play Out: Process model allows for many more scenarios get support from local manager (b) adcefcdefbdefbdeg adceg adbeh adbeh abdeg acdefcdefbdeh abcefbdeh acdefcdefbdeh acbefbdeg abdeg abdeg acbefbdeh acdefcdefbdeh adbeh adceh acbefbdeg adcefcdefbdefbdeg adceh adcefcdefbdefbdeg abdeg start register travel request (a) get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) end
Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary Play-In get support from local manager (b) register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
Loesje van der Aalst desire line
Play In: Simple process allowing for 4 traces abdeg adbeg adbeg adbeh abdeh abdeg abdeh abdeh abdeh abdeh adbeh adbeh adbeh get support from local manager (b) accept request (g) register travel request (a) decide (e) start check budget by finance (d) reject request (h) end
Play In: Process allowing for more traces adcefcdefbdefbdeg abdeg adcefcdefbdefbdeg abcefbdeh acbefbdeg acdefcdefbdeh adceg adbeh adbeh adcefcdefbdefbdeg abdeg abdeg abdeg acbefbdeh acdefcdefbdeh acbefbdeg adceh adbeh adceh acdefcdefbdeh get support from local manager (b) register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
No modeling needed!
Example Process Discovery (Dutch housing agency, 208 cases, 5987 events)
Example process discovery for hospital (627 gynecological oncology patients, 24331 events)
Replay Case Activity Timestamp Resource 432 register travel request (a) 18-3-2014:9.15 John 432 get support from local manager (b) 18-3-2014:9.25 Mary 432 check budget by finance (d) 19-3-2014:8.55 John 432 decide (e) 19-3-2014:9.36 Sue 432 accept request (g) 19-3-2014:9.48 Mary get support from local manager (b) start register travel request (a) get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) end
process model event data
desire line very safe system
Replay a c d e g get support from local manager (b) register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
Replay a c get support from local manager (b) e g? check budget (d) is missing! register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
Replay a c h d e g get support from local manager (b)? reject request (h) is impossible register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f)
Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
Replay with timestamps a 9.15 c 9.20 d 9.35 e 10.15 g 11.30 start 9.15 register travel request (a) get support from local manager (b) 9.20 5 55 get detailed motivation letter (c) check budget by finance (d) 20 40 9.35 reinitiate request (f) 10.15 decide (e) 75 11.30 accept request (g) reject request (h) end
Replay with timestamps for many traces frequencies of paths frequencies of activities get support from local manager (b) waiting times and other delays between activities register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reinitiate request (f) durations of activities reject request (h) end
Performance Analysis Using Replay (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)
Overview world business processes people machines components organizations models analyzes Play-Out supports/ controls specifies configures implements analyzes software system records events, e.g., messages, transactions, etc. (process) model discovery conformance Play-In event logs enhancement Replay
Process mining toolbox
examine thoroughly register request examine casually decide pay compensation start check ticket reject request end reinitiate request Process models can be seen as "process maps"
What we can learn from maps abstraction: leaving out insignificant roads and towns aggregation: smaller entities are amalgamated into larger ones (suburbs and cities) layout: positioning of elements has a clear meaning size and color: highlight more important entities (e.g. highways have a different color)
Compare process models to maps get support from local manager (b) start register travel request (a) abstraction? get detailed motivation letter (c) check budget by finance (d) reinitiate request (f) decide (e) accept request (g) reject request (h) size and color? end b aggregation? start A a register request c1 c2 examine thoroughly A c examine casually d check ticket c3 c4 e decide f M c5 reinitiate request g pay compensation h reject request end layout?
Can we see what matters most? get support from local manager (b) metropolis or village? register travel request (a) get detailed motivation letter (c) decide (e) accept request (g) start check budget by finance (d) reject request (h) end reinitiate request (f) highway or dirt road?
"the map" does not exist
Zoom
Subway map
Bicycle map
a map is a view on reality map reality same for process models
Model provides a view on reality (event data), just like a map!
Multiple views depending on purpose (performance, compliance, training, etc.).
breathing life into process models otherwise they end up in some drawer
Project on maps: traffic jams real estate for sale location of trucks/trains crime rates Project on process models: bottlenecks deviations costs
Examples
Not that new Charles Minard's 1869 chart showing the number of men in Napoleon s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. 422.000 175.000 100.000 10.000 24.000
Actively using process models
What can we lean from navigation devices? detect prediction recommendation
Driven by maps, historic information, and current information. Flexible: Adapts to circumstances and does not force the driver to take a particular route. Can your information system do this?
Conclusion Process models are like maps! Connecting event data and process models! better models live models
Positioning process mining process model analysis (simulation, verification, optimization, gaming, etc.) performanceoriented questions, problems and solutions process mining complianceoriented questions, problems and solutions data-oriented analysis (data mining, machine learning, business intelligence)
data science process science