Business Process Intelligence Course Lecture 8 Summary and Outlook prof.dr.ir. Wil van der Aalst www.processmining.org
Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 1
Clive Humby (dunnhumby) 2006 Wil http://www.multivu.com/assets/58095/photos/data-is-the-new-oil-infographic-nigel-holmes-2012-from-the-human-face-of-big-data-original.jpg van der Aalst TU/e (use only with permission & acknowledgements)
data HW/SW systems processes
process models as maps
Business process maps The first geographical maps date back to the 7th Millennium BC. Since then cartographers have improved their skills and techniques to create maps thereby addressing problems such as clearly representing desired traits, eliminating irrelevant details, reducing complexity, and improving PAGE 5 understandability.
Example of a map: Road map of NL The map abstracts from smaller cities and less significant roads. Only the bigger cities, highways, and other important roads are shown. Cities aggregate local roads and local districts. Also note the use of color, size, etc. PAGE 6
PAGE 7
Charles Joseph Minard's map showing the size of Napoleon's army at different locations/times Charles Minard's 1869 chart showing the number of men in Napoleon s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. PAGE 8
PAGE 9
Illustrating the problem x start 1.0 0.4 p1 0.3 a 0.3 p3 y 0.4 p7 f 1.0 z 1.0 p9 p12 0.6 0.6 0.4 j 0.6 0.4 b c 0.4 0.3 0.3 d p4 0.4 g h 0.6 p10 k l p2 p5 p8 p11 1.0 e i 1.0 p6 end PAGE 10
Classical top level view: low level connections still exist p3 p9 p4 x y z p10 p5 p11 x start 1.0 a p3 y f 1.0 z 1.0 j p9 p6 p1 0.4 0.3 0.3 p7 0.4 0.6 p12 0.6 0.6 0.4 0.4 b c d 0.4 0.3 0.3 p4 g 0.4 h 0.6 p10 k l p2 p5 p8 p11 1.0 e i 1.0 p6 end PAGE 11
Seamless zoom Threshold: 1.0 x y z a f j x y z e i Threshold: 0.6 x y z a f j h k x y z e i Threshold: 0.4 x y z a f j b g h k l x y z e i Threshold: 0.3 x a f y j z b c d g h k l x y z e i PAGE 12
most process modeling notations assume a fixed hierarchy no seamless zoom-in and zoom out! traditional hierarchy concepts don't support "Google Maps" abstraction PAGE 13
Example: Reviewing papers (100 cases generating 3730 events) WF-net discovered using the α-algorithm PAGE 14
Fuzzy miner: two views on the same process fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significance of connection PAGE 15
Balancing between both extremes fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significance of connection aggregated node containing 10 activities inner structure of aggregated node PAGE 16
Projecting dynamic information on business process maps PAGE 17
Projecting traffic jams on maps PAGE 18
Business process movies PAGE 19
information system as a navigation device
Navigation Whereas a TomTom device is continuously showing the expected arrival time, users of today s information systems are often left clueless about likely outcomes of the cases they are working on. Car navigation systems provide directions and guidance without controlling the driver. The driver is still in control, but, given a goal (e.g. to get from A to B as fast as possible), the navigation system recommends the next action to be taken. Operational support provides TomTom functionality for business processes. PAGE 21
Recommend: How to get home ASAP? Take a left turn! Detect: You drive too fast! Predict: When will I be home? At 11.26! PAGE 22
Relating the process mining framework to cartography and navigation people machines business processes world documents organizations information system(s) event logs provenance pre mortem current data historic data post mortem navigation auditing cartography explore predict recommend detect check compare promote discover enhance diagnose models de jure models de facto models control-flow control-flow data/rules data/rules resources/ organization resources/ organization PAGE 23
What should I have learned from this course?
Lecture 1 Understanding that process mining combines process model analysis (BPM) and data-oriented analysis (e.g., data mining). Understand the link to data science. Understand the link to data mining (supervised and unsupervised learning). Understand the relation between models and event data: play-out, play-in, and replay. Able to interpret a decision tree. Able to compute entropy (per node and for the whole tree). Understand the concept of information gain. PAGE 25
Information Gain Based on Entropy Note: information gain while classification does not change. #young=546 #old=314 E=0.946848 young (860/314) Information gain is 0.107012 split on attribute smoker Overall Entropy #young=184 #old=11 E = 0.313027 yes smoker no young (195/11) young (665/303) #young=362 #old=303 E=0.994314 PAGE 26
Lecture 1 (cont'd) Interpret the results of clustering. Understand the k-means algorithm. Read a dendrogram produced by agglomerative hierarchical clustering. Understand frequent item sets and association rules. Compute the support, confidence, and lift of an association rule. Able to create a confusion matrix (tp,fn,fp,tn) and compute F1 score. PAGE 27
Association rules and confusion matrix actual class + - predicted class + - tp fn fp tn p n p n N name error accuracy tp-rate fp-rate precision recall formula (fp+fn)/n (tp+tn)/n tp/p fp/n tp/p tp/p PAGE 28
Lecture 2 Understand the limitations of pure model-based analysis. Understand the notion of an event log and process discovery. Understand basic Petri net concepts (marking, liveness, boudedness, soundness). Able to read a simple BPMN diagram. Intuitive understanding of the four basic quality dimensions of process discovery: fitness, precision, generalization, and simplicity. Able to derive the alpha (α) relations (>,,,#) for models and event logs. PAGE 29
α algorithm Let L be an event log over T. α(l) is defined as follows. 1. T L = { t T σ L t σ}, 2. T I = { t T σ L t = first(σ) }, 3. T O = { t T σ L t = last(σ) }, 4. X L = { (A,B) A T L A ø B T L B ø a A b B a L b a1,a2 A a 1 # L a 2 b1,b2 B b 1 # L b 2 }, 5. Y L = { (A,B) X L (A,B ) XL A A B B (A,B) = (A,B ) }, 6. P L = { p (A,B) (A,B) Y L } {i L,o L }, 7. F L = { (a,p (A,B) ) (A,B) Y L a A } { (p (A,B),b) (A,B) Y L b B } { (i L,t) t T I } { (t,o L ) t T O }, and 8. α(l) = (P L,T L,F L ). PAGE 30
Lecture 2 (cont'd) Able to apply the α algorithm to any event log and interpret the result. Know the limitations of the α algorithm (able to construct event logs resulting in particular problems). Able to show overfitting and underfitting models. fitness lift ability to explain observed behavior thrust avoiding overfitting generalization Process Mining Occam s Razor simplicity avoiding underfitting precision drag gravity PAGE 31
Lecture 3 Understand the challenges of process discovery (balancing the four forces and incomplete event logs). Able to read and construct C-nets. Able to convert C-nets into Petri nets (if possible) and vice-versa. Understand the different phases of the heuristic mining approach. Given an event log, compute the dependency measure. Determine the dependency graph based on two thresholds. PAGE 32
Dependency graph using a higher threshold (at least 5 direct successions and a dependency of at least 0.9) 11(0.92) b 5(0.83) b 11(0.92) 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 4(0.80) d 13(0.93) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d PAGE 33
Lecture 3 (cont'd) Understand the different phases of the two-phase approach based on state-based regions. Able to construct a transition system based on an event log and particular abstraction (past/future, set/bag/sequence, etc.). Able to determine and check state-based regions. Know the limitations of the state-based region approach (able to construct event logs resulting in particular problems). PAGE 34
Example of State-Based Region a b [ a,b] e [a,e] d [a,d,e] [ ] [a] c b c d [a,c] [a,b,c] [a,b,c,d] enter: b,e leave: d do-not-cross: a,c b a p1 e p3 d start end p2 c p4 PAGE 35
Lecture 4 Have an overview of additional process mining approaches (genetic, language-based regions, etc.). Comprehend the minimal requirements for event data. Understand the elements of the XES format (not just control-flow). Able to name data quality problems (e.g. imprecise timestamps). Understand that given a data set different event logs can be extracted based on different viewpoints. Have a good understanding of available tooling (ProM, Disco, Celonis, Perceptive process mining). PAGE 36
Lecture 5 Understand the concept of conformance checking. Able to name the different applications of conformance checking. Able to compute the produced, consumed, missing and remaining tokens given a single trace or whole log. Compute fitness based on counting missing and remaining tokens. Able to interpret the diagnostics of such a fitness computation. Able to compute and compare footprints based on models and logs. Understand the notion of alignments. PAGE 37
Fitness = 0.8 trace frequency produced tokens (p) remaining tokens (r) consumed tokens (c) missing tokens (m) produced tokens (all) remaining tokens (all) consumed tokens (all) missing tokens (all) abefcd 10 9 2 9 2 90 20 90 20 abbefccd 10 11 2 11 2 110 20 110 20 200 40 200 40 sum p sum r sum c sum m p1 b p3 fitness 0.8 a f e d start p5 end p2 c p4 PAGE 38
Lecture 6 Understand the concepts of model repair and model extension. Able to interpret the different types of dotted charts. Able to convert a decision point into a classification problem. Able to convert a decision tree for a decision point into guards. Able to replay a timed event log and compute waiting times, service times, and routing probabilities. Able to construct the resource-activity matrix given an event log. Able to construct the handover of work matrix. PAGE 39
Lecture 6 (cont'd) Able to create a social network based on the handover of work matrix. Understand how the resource-activity matrix can be used to cluster resources and construct organizational models. Understand the process cube notion as a means to do comparative process mining. Understand how the different types of process mining can be combined to create models covering all perspectives (control-flow, data, resources, time, etc.). PAGE 40
Lecture 7 Able to reproduce the refined process mining framework (listing 10 activities). Understand the difference between "pre mortem" and "post mortem" event data and "de jure" and "de facto" models. Understand the three types of operational support: detect, predict, and recommend. Able to explain these concepts using a timed event log, e.g., constructing an annotated transition system to compute the remaining flow time. Understand the difference between declarative and procedural languages. PAGE 41
Lecture 7 (cont'd) Understand the process spectrum (from Lasagna to Spaghetti processes). Able to reproduce the L* life-cycle model for process mining projects. Have an overview of the wide range of possible applications and understand the different opportunities depending on the type of process (Lasagna versus Spaghetti). PAGE 42
Lecture 8 Understand that process models can be viewed as maps. Multiple maps for the same reality. Fixed decomposition does not work. Projecting information on maps. Consolidation of the different lectures. PAGE 43
Difference between 2IIE0 and 2IIF0 There are two variants of the course 2IIE0 (5 ECTS) and 2IIF0 (6 ECTS), as you know The final written test on Wednesday 9/4/2014, 9.00-12.00 will have two variants: The 2IIF0 (6 ECTS) includes the content of Lecture 6 and Chapter 8 of the book. The 2IIE0 (5 ECTS) does not include the content of Lecture 6 and Chapter 8. PAGE 44
closing
Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 46
Process Mining: A bridge between data mining and business process management PAGE 47
Experience the magic of process mining, i.e., discovering and improving processes based on facts rather than fiction! PAGE 48