Summary and Outlook. Business Process Intelligence Course Lecture 8. prof.dr.ir. Wil van der Aalst. www.processmining.org



Similar documents
Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Process Mining Data Science in Action

Process Mining. Data science in action

Chapter 12 Analyzing Spaghetti Processes

Using Process Mining to Bridge the Gap between BI and BPM

Implementing Heuristic Miner for Different Types of Event Logs

Business Process Modeling

Process Mining and Fraud Detection

Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data

Chapter 4 Getting the Data

Process Mining and Visual Analytics: Breathing Life into Business Process Models

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010

Process Modelling from Insurance Event Log

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective

Trace Clustering in Process Mining

Data Mining Algorithms Part 1. Dejan Sarka

Dotted Chart and Control-Flow Analysis for a Loan Application Process

Data Science. Research Theme: Process Mining

Investigating Clinical Care Pathways Correlated with Outcomes

Process Mining and the ProM Framework: An Exploratory Survey - Extended report

CHAPTER 1 INTRODUCTION

Process Mining Tools: A Comparative Analysis

Process Mining Using BPMN: Relating Event Logs and Process Models

ProM Framework Tutorial

Analysis of Service Level Agreements using Process Mining techniques

Chapter 5 Process Discovery: An Introduction

Combination of Process Mining and Simulation Techniques for Business Process Redesign: A Methodological Approach

Mining Configurable Process Models from Collections of Event Logs

Article. Abstract. This is a pre-print version. For the printed version please refer to

Process Mining The influence of big data (and the internet of things) on the supply chain

Process Mining: Making Knowledge Discovery Process Centric

Master Thesis September 2010 ALGORITHMS FOR PROCESS CONFORMANCE AND PROCESS REFINEMENT

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery

Discovering User Communities in Large Event Logs

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

Model Discovery from Motor Claim Process Using Process Mining Technique

Relational XES: Data Management for Process Mining

BUsiness process mining, or process mining in a short

ProM 6 Tutorial. H.M.W. (Eric) Verbeek mailto:h.m.w.verbeek@tue.nl R. P. Jagadeesh Chandra Bose mailto:j.c.b.rantham.prabhakara@tue.

Business Intelligence and Process Modelling

Feature. Applications of Business Process Analytics and Mining for Internal Control. World

Decision Mining in Business Processes

Process Mining Online Assessment Data

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining Applications in Higher Education

Process Mining and Network Analysis

Software Visualization and Model Generation

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining Techniques Chapter 6: Decision Trees

How To Cluster

Chapter 12 Discovering New Knowledge Data Mining

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

The Data Mining Process

Data Preprocessing. Week 2

BPIC 2014: Insights from the Analysis of Rabobank Service Desk Processes

Process-Aware Information Systems: Lessons to be Learned from Process Mining

Data Science Betere processen en producten dankzij (Big) data. Wil van der Aalst

PROCESS mining has been demonstrated to possess the

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Discovering process models from empirical data

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

WoPeD - An Educational Tool for Workflow Nets

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Towards Cross-Organizational Process Mining in Collections of Process Models and their Executions

Social Media Mining. Data Mining Essentials

Formal Modeling and Analysis by Simulation of Data Paths in Digital Document Printers

Knowledge Discovery and Data Mining

Summary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen

Protein Protein Interaction Networks

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Unsupervised learning: Clustering

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Business process measurement - data mining. enn@cc.ttu.ee

Big Data Text Mining and Visualization. Anton Heijs

Process Mining Manifesto

D A T A M I N I N G C L A S S I F I C A T I O N

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Introduction to Data Mining

Machine Learning using MapReduce

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Data Mining Applications in Manufacturing

IYOPRO Improve your Processes

Generation of a Set of Event Logs with Noise

A new Approach for Intrusion Detection in Computer Networks Using Data Mining Technique

EDIminer: A Toolset for Process Mining from EDI Messages

Service Discovery from Observed Behavior While Guaranteeing Deadlock Freedom in Collaborations

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Experiments in Web Page Classification for Semantic Web

Business Process Analysis in Healthcare Environments: a Methodology based on Process Mining

Azure Machine Learning, SQL Data Mining and R

Clustering UE 141 Spring 2013

Process Mining A Comparative Study

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Visualization methods for patent data

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Customer Analytics. Turn Big Data into Big Value

Part 2: Community Detection

Transcription:

Business Process Intelligence Course Lecture 8 Summary and Outlook prof.dr.ir. Wil van der Aalst www.processmining.org

Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 1

Clive Humby (dunnhumby) 2006 Wil http://www.multivu.com/assets/58095/photos/data-is-the-new-oil-infographic-nigel-holmes-2012-from-the-human-face-of-big-data-original.jpg van der Aalst TU/e (use only with permission & acknowledgements)

data HW/SW systems processes

process models as maps

Business process maps The first geographical maps date back to the 7th Millennium BC. Since then cartographers have improved their skills and techniques to create maps thereby addressing problems such as clearly representing desired traits, eliminating irrelevant details, reducing complexity, and improving PAGE 5 understandability.

Example of a map: Road map of NL The map abstracts from smaller cities and less significant roads. Only the bigger cities, highways, and other important roads are shown. Cities aggregate local roads and local districts. Also note the use of color, size, etc. PAGE 6

PAGE 7

Charles Joseph Minard's map showing the size of Napoleon's army at different locations/times Charles Minard's 1869 chart showing the number of men in Napoleon s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. PAGE 8

PAGE 9

Illustrating the problem x start 1.0 0.4 p1 0.3 a 0.3 p3 y 0.4 p7 f 1.0 z 1.0 p9 p12 0.6 0.6 0.4 j 0.6 0.4 b c 0.4 0.3 0.3 d p4 0.4 g h 0.6 p10 k l p2 p5 p8 p11 1.0 e i 1.0 p6 end PAGE 10

Classical top level view: low level connections still exist p3 p9 p4 x y z p10 p5 p11 x start 1.0 a p3 y f 1.0 z 1.0 j p9 p6 p1 0.4 0.3 0.3 p7 0.4 0.6 p12 0.6 0.6 0.4 0.4 b c d 0.4 0.3 0.3 p4 g 0.4 h 0.6 p10 k l p2 p5 p8 p11 1.0 e i 1.0 p6 end PAGE 11

Seamless zoom Threshold: 1.0 x y z a f j x y z e i Threshold: 0.6 x y z a f j h k x y z e i Threshold: 0.4 x y z a f j b g h k l x y z e i Threshold: 0.3 x a f y j z b c d g h k l x y z e i PAGE 12

most process modeling notations assume a fixed hierarchy no seamless zoom-in and zoom out! traditional hierarchy concepts don't support "Google Maps" abstraction PAGE 13

Example: Reviewing papers (100 cases generating 3730 events) WF-net discovered using the α-algorithm PAGE 14

Fuzzy miner: two views on the same process fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significance of connection PAGE 15

Balancing between both extremes fuzzy model showing all activities fuzzy model showing only two activities color and width of arc indicates significance of connection aggregated node containing 10 activities inner structure of aggregated node PAGE 16

Projecting dynamic information on business process maps PAGE 17

Projecting traffic jams on maps PAGE 18

Business process movies PAGE 19

information system as a navigation device

Navigation Whereas a TomTom device is continuously showing the expected arrival time, users of today s information systems are often left clueless about likely outcomes of the cases they are working on. Car navigation systems provide directions and guidance without controlling the driver. The driver is still in control, but, given a goal (e.g. to get from A to B as fast as possible), the navigation system recommends the next action to be taken. Operational support provides TomTom functionality for business processes. PAGE 21

Recommend: How to get home ASAP? Take a left turn! Detect: You drive too fast! Predict: When will I be home? At 11.26! PAGE 22

Relating the process mining framework to cartography and navigation people machines business processes world documents organizations information system(s) event logs provenance pre mortem current data historic data post mortem navigation auditing cartography explore predict recommend detect check compare promote discover enhance diagnose models de jure models de facto models control-flow control-flow data/rules data/rules resources/ organization resources/ organization PAGE 23

What should I have learned from this course?

Lecture 1 Understanding that process mining combines process model analysis (BPM) and data-oriented analysis (e.g., data mining). Understand the link to data science. Understand the link to data mining (supervised and unsupervised learning). Understand the relation between models and event data: play-out, play-in, and replay. Able to interpret a decision tree. Able to compute entropy (per node and for the whole tree). Understand the concept of information gain. PAGE 25

Information Gain Based on Entropy Note: information gain while classification does not change. #young=546 #old=314 E=0.946848 young (860/314) Information gain is 0.107012 split on attribute smoker Overall Entropy #young=184 #old=11 E = 0.313027 yes smoker no young (195/11) young (665/303) #young=362 #old=303 E=0.994314 PAGE 26

Lecture 1 (cont'd) Interpret the results of clustering. Understand the k-means algorithm. Read a dendrogram produced by agglomerative hierarchical clustering. Understand frequent item sets and association rules. Compute the support, confidence, and lift of an association rule. Able to create a confusion matrix (tp,fn,fp,tn) and compute F1 score. PAGE 27

Association rules and confusion matrix actual class + - predicted class + - tp fn fp tn p n p n N name error accuracy tp-rate fp-rate precision recall formula (fp+fn)/n (tp+tn)/n tp/p fp/n tp/p tp/p PAGE 28

Lecture 2 Understand the limitations of pure model-based analysis. Understand the notion of an event log and process discovery. Understand basic Petri net concepts (marking, liveness, boudedness, soundness). Able to read a simple BPMN diagram. Intuitive understanding of the four basic quality dimensions of process discovery: fitness, precision, generalization, and simplicity. Able to derive the alpha (α) relations (>,,,#) for models and event logs. PAGE 29

α algorithm Let L be an event log over T. α(l) is defined as follows. 1. T L = { t T σ L t σ}, 2. T I = { t T σ L t = first(σ) }, 3. T O = { t T σ L t = last(σ) }, 4. X L = { (A,B) A T L A ø B T L B ø a A b B a L b a1,a2 A a 1 # L a 2 b1,b2 B b 1 # L b 2 }, 5. Y L = { (A,B) X L (A,B ) XL A A B B (A,B) = (A,B ) }, 6. P L = { p (A,B) (A,B) Y L } {i L,o L }, 7. F L = { (a,p (A,B) ) (A,B) Y L a A } { (p (A,B),b) (A,B) Y L b B } { (i L,t) t T I } { (t,o L ) t T O }, and 8. α(l) = (P L,T L,F L ). PAGE 30

Lecture 2 (cont'd) Able to apply the α algorithm to any event log and interpret the result. Know the limitations of the α algorithm (able to construct event logs resulting in particular problems). Able to show overfitting and underfitting models. fitness lift ability to explain observed behavior thrust avoiding overfitting generalization Process Mining Occam s Razor simplicity avoiding underfitting precision drag gravity PAGE 31

Lecture 3 Understand the challenges of process discovery (balancing the four forces and incomplete event logs). Able to read and construct C-nets. Able to convert C-nets into Petri nets (if possible) and vice-versa. Understand the different phases of the heuristic mining approach. Given an event log, compute the dependency measure. Determine the dependency graph based on two thresholds. PAGE 32

Dependency graph using a higher threshold (at least 5 direct successions and a dependency of at least 0.9) 11(0.92) b 5(0.83) b 11(0.92) 11(0.92) 11(0.92) a c e 11(0.92) 11(0.92) 13(0.93) 4(0.80) d 13(0.93) a c e 11(0.92) 11(0.92) 13(0.93) 13(0.93) d PAGE 33

Lecture 3 (cont'd) Understand the different phases of the two-phase approach based on state-based regions. Able to construct a transition system based on an event log and particular abstraction (past/future, set/bag/sequence, etc.). Able to determine and check state-based regions. Know the limitations of the state-based region approach (able to construct event logs resulting in particular problems). PAGE 34

Example of State-Based Region a b [ a,b] e [a,e] d [a,d,e] [ ] [a] c b c d [a,c] [a,b,c] [a,b,c,d] enter: b,e leave: d do-not-cross: a,c b a p1 e p3 d start end p2 c p4 PAGE 35

Lecture 4 Have an overview of additional process mining approaches (genetic, language-based regions, etc.). Comprehend the minimal requirements for event data. Understand the elements of the XES format (not just control-flow). Able to name data quality problems (e.g. imprecise timestamps). Understand that given a data set different event logs can be extracted based on different viewpoints. Have a good understanding of available tooling (ProM, Disco, Celonis, Perceptive process mining). PAGE 36

Lecture 5 Understand the concept of conformance checking. Able to name the different applications of conformance checking. Able to compute the produced, consumed, missing and remaining tokens given a single trace or whole log. Compute fitness based on counting missing and remaining tokens. Able to interpret the diagnostics of such a fitness computation. Able to compute and compare footprints based on models and logs. Understand the notion of alignments. PAGE 37

Fitness = 0.8 trace frequency produced tokens (p) remaining tokens (r) consumed tokens (c) missing tokens (m) produced tokens (all) remaining tokens (all) consumed tokens (all) missing tokens (all) abefcd 10 9 2 9 2 90 20 90 20 abbefccd 10 11 2 11 2 110 20 110 20 200 40 200 40 sum p sum r sum c sum m p1 b p3 fitness 0.8 a f e d start p5 end p2 c p4 PAGE 38

Lecture 6 Understand the concepts of model repair and model extension. Able to interpret the different types of dotted charts. Able to convert a decision point into a classification problem. Able to convert a decision tree for a decision point into guards. Able to replay a timed event log and compute waiting times, service times, and routing probabilities. Able to construct the resource-activity matrix given an event log. Able to construct the handover of work matrix. PAGE 39

Lecture 6 (cont'd) Able to create a social network based on the handover of work matrix. Understand how the resource-activity matrix can be used to cluster resources and construct organizational models. Understand the process cube notion as a means to do comparative process mining. Understand how the different types of process mining can be combined to create models covering all perspectives (control-flow, data, resources, time, etc.). PAGE 40

Lecture 7 Able to reproduce the refined process mining framework (listing 10 activities). Understand the difference between "pre mortem" and "post mortem" event data and "de jure" and "de facto" models. Understand the three types of operational support: detect, predict, and recommend. Able to explain these concepts using a timed event log, e.g., constructing an annotated transition system to compute the remaining flow time. Understand the difference between declarative and procedural languages. PAGE 41

Lecture 7 (cont'd) Understand the process spectrum (from Lasagna to Spaghetti processes). Able to reproduce the L* life-cycle model for process mining projects. Have an overview of the wide range of possible applications and understand the different opportunities depending on the type of process (Lasagna versus Spaghetti). PAGE 42

Lecture 8 Understand that process models can be viewed as maps. Multiple maps for the same reality. Fixed decomposition does not work. Projecting information on maps. Consolidation of the different lectures. PAGE 43

Difference between 2IIE0 and 2IIF0 There are two variants of the course 2IIE0 (5 ECTS) and 2IIF0 (6 ECTS), as you know The final written test on Wednesday 9/4/2014, 9.00-12.00 will have two variants: The 2IIF0 (6 ECTS) includes the content of Lecture 6 and Chapter 8 of the book. The 2IIE0 (5 ECTS) does not include the content of Lecture 6 and Chapter 8. PAGE 44

closing

Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 46

Process Mining: A bridge between data mining and business process management PAGE 47

Experience the magic of process mining, i.e., discovering and improving processes based on facts rather than fiction! PAGE 48