Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Size: px
Start display at page:

Download "Methods for the specification and verification of business processes MPB (6 cfu, 295AA)"

Transcription

1 Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Roberto Bruni Process Mining 1

2 Object We overview the key principles of process mining 2

3 Process Mining Process mining is a relative young research discipline that sits between machine learning and data mining on the one hand and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today s systems. 3

4 Processes, Cases, Events, Attributes A process consists of cases. A case consists of events such that each event relates to precisely one case. Events within a case are ordered. Events can have attributes. Examples of typical attribute names are activity, time, costs, and resource. 4

5 Event Logs Let us assume that it is possible to sequentially record events such that each event: refers to an activity (i.e., a well-defined step in the process) and is related to a particular case (i.e., a process instance). 5

6 Event Log Example 1.4 Analyzing an Example Log 13 Table 1.1 A fragment of some event log: each line corresponds to an event Case id Event id Properties Timestamp Activity Resource Cost :11.02 Register request Pete :10.06 Examine thoroughly Sue :15.12 Check ticket Mike :11.18 Decide Sara :14.24 Reject request Pete :11.32 Register request Mike :12.12 Check ticket Mike :14.16 Examine casually Pete :11.22 Decide Sara :12.05 Pay compensation Ellen :14.32 Register request Pete :15.06 Examine casually Mike

7 Mining Scheme 1.3 Process Mining 9 Fig. 1.4 Positioning of the three main types of process mining: discovery, conformance, and engiovedì 12 dicembre 13

8 Discovery A discovery technique takes an event log and produces a model without using any a-priori information. If the event log contains information about resources, one can also discover resource-related models, e.g., a social network showing how people work together in an organization. 8

9 Conformance An existing process model is compared with an event log of the same process. Conformance checking can be used to check if reality, as recorded in the log, conforms to the model and vice versa. Conformance checking may be used to detect, locate and explain deviations, and to measure the severity of these deviations. 9

10 Enhancement The idea is to extend/improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. 10

11 Enhancement: Repair One type of enhancement is repair, i.e., modifying the model to better reflect reality. For example, if two activities are modeled sequentially but in reality can happen in any order, then the model may be corrected to reflect this. 11

12 Four Perspectives 12

13 Control-Flow Perspective The control-flow perspective focuses on the control-flow, i.e., the ordering of activities. The goal of mining this perspective is to find a good characterization of all possible paths, e.g., expressed in terms of a Petri net or some other notation (e.g., EPC, BPMN, and UML AD). We shall focus on this perspective 13

14 Organizational Perspective The organizational perspective focuses on information about resources hidden in the log, i.e., which actors (e.g., people, systems, roles, and departments) are involved and how they are related. The goal is to either structure the organization by classifying people in terms of roles and organizational units or to show the social network. 14

15 Case Perspective The case perspective focuses on properties of cases. Obviously, a case can be characterized by its path in the process or by the originators working on it. However, cases can also be characterized by the values of the corresponding data elements. For example, if a case represents a replenishment order, it may be interesting to know the supplier or the number of products ordered. 15

16 Time Perspective The time perspective is concerned with the timing and frequency of events (performance checking). When events bear timestamps it is possible to discover bottlenecks, measure service levels, monitor the utilization of resources, and predict the remaining processing time of running cases. 16

17 Play-in, Play-out, Replay 17

18 Play-in 1.5 Play-in, Play-out, and Replay 19 18

19 1.5 Play-in, Play-out, and Replay 19 Play-out 19

20 Replay Fig. 1.8 Three ways of relating event logs (or other sources of information containing example behavior) and process models: Play-in, Play-out, and Replay than 56 cigarettes tend to die young ) and association rules ( people that buy diapers also buy beer ). Unfortunately, it is not possible to use conventional data mining techniques to Play-in process models. 20 Only recently, process mining tech- niques have become readily available to discover process models based on event

21 An Example 21

22 Event Log Example 1.4 Analyzing an Example Log 13 Table 1.1 A fragment of some event log: each line corresponds to an event Case id Event id Properties Timestamp Activity Resource Cost :11.02 Register request Pete :10.06 Examine thoroughly Sue :15.12 Check ticket Mike :11.18 Decide Sara :14.24 Reject request Pete :11.32 Register request Mike :12.12 Check ticket Mike :14.16 Examine casually Pete :11.22 Decide Sara :12.05 Pay compensation Ellen :14.32 Register request Pete :15.06 Examine casually Mike

23 Table 1.1 A fragment of some event log: each line corresponds to an event Case id Event id Properties Timestamp Activity Resource Cost :11.02 Register request Pete Event Log Example Table 1.1 (Continued) :14.16 Examine casually Pete :10.45 Pay compensation Ellen :15.02 Register request Pete :12.06 Check ticket Mike :14.43 Examine thoroughly Sean :12.02 Decide Sara :15.44 Reject request Ellen :09.02 Register request Ellen :10.16 Examine casually Mike :11.22 Check ticket Pete :13.28 Decide Sara :16.18 Reinitiate request Sara :14.33 Check ticket Ellen :15.50 Examine casually Mike :11.18 Decide Sara :12.48 Reinitiate request Sara :09.06 Examine casually Sue :11.34 Check ticket Pete :13.12 Decide Sara :14.56 Reject request Mike Table 1.1 (Continued) Case id Event id Properties :15.02 R :16.06 E Timestamp Activity Resource Cost :16.22 C 14 1 Introduction :16.52 D :15.02 Register request Mike :10.06 Examine thoroughly Sue :16.06 Examine casually Ellen :15.12 Check ticket Mike :16.22 Check ticket Mike :11.18 Decide Sara :16.52 Decide Sara :14.24 Reject request Pete Case id Event id Properties :11.47 Pay compensation Mike :11.32 Register request Mike :12.12 Check ticket Mike 100 Timestamp... Activity Resource Cost :11.47 P :11.22 Decide Sara :15.02 Table 1.2 A more compact :12.05 Pay compensation Ellen Table Register representation of log shown Case 1.2request id A more compact MikeTrace :14.32 Register request Pete in Table 1.1: a = register representation of log shown Case id :16.06 Examine casually Ellen request, b = examine 1 a,b,d,e,h :15.06 Examine casually Mike thoroughly, c = examine in Table 1.1: a = register :16.34 Check ticket Ellen : Check2 ticket Mike a,d,c,e,g casually, d = check ticket, request, :09.18 Decide Sara b = examine 1 a,c,d,e,f,b,d,e,g :16.52 e = decide, f = reinitiate :12.18 Reinitiate request Sara request, g = pay thoroughly, Decide4 c = examine Sara a,d,b,e,h :13.06 Examine thoroughly Sean : compensation, and h = casually, Pay rejectcompensation 5 d = check ticket, Mike a,c,d,e,f,d,c,e,f,c,d,e,h request :11.43 Check ticket Pete a,c,d,e,g 3 e = decide, f = reinitiate :09.55 Decide Sara request, g = pay compensation, and h = reject 5 request Table 1.2 A more compact representation of log shown in Table 1.1: a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request Case id Trace 1 a,b,d,e,h 2 a,d,c,e,g Fig. 1.5 The process model discovered by the α-algorithm [103] based on the set of traces { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, a,c,d,e,g } After executing h, the case ends in the desired final marking with just a token in place end. Similarly, it can be checked that the other five traces shown in Table a,c,d,e,f,b,d,e,g 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 6 a,c,d,e,g......

24 e = decide, f = reinitiate request, g = pay compensation, and h = reject request 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 14 6 a,c,d,e,g 1 Introduction Discovery Example Table 1.1 (Continued) Case id Event id Properties Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact Fig. 1.5 The process model discovered representation by theofα-algorithm log shown [103] Case based id on the set of traces Trace { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, in Table 1.1: a = register a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, a,c,d,e,g } request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g casually, d = check ticket, After executing h, the case ends in the desired final marking 3 with just a token in a,c,d,e,f,b,d,e,g e = decide, f = reinitiate place end. Similarly, it can request, be checked g = that pay the other five traces 4 shown in Table 1.2 a,d,b,e,h are also possible in the model compensation, and that alland of these h = reject traces result 5 in the marking with a,c,d,e,f,d,c,e,f,c,d,e,h just a token in place end. request 6 a,c,d,e,g

25 e = decide, f = reinitiate request, g = pay compensation, and h = reject request 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 14 6 a,c,d,e,g 1 Introduction Discovery Example Table 1.1 (Continued) Case id Event id Properties Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact All cases start Fig. 1.5 The process model discovered representation with by theofα-algorithm log a shown [103] Case based id on the set of traces Trace { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, in Table 1.1: a = register a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, and a,c,d,e,g } end with either g or h. request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g casually, d = check ticket, After executing h, the case ends in the desired final marking 3 with just a token in a,c,d,e,f,b,d,e,g e = decide, f = reinitiate place end. Similarly, it can request, be checked g = that pay the other five traces 4 shown in Table 1.2 a,d,b,e,h are also one possible of inthe model examination compensation, and that alland of these h = reject traces result 5 in the marking with a,c,d,e,f,d,c,e,f,c,d,e,h just a token in place end. request activities (b or c). 6 a,c,d,e,g Every e is preceded by d and

26 e = decide, f = reinitiate request, g = pay compensation, and h = reject request 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 14 6 a,c,d,e,g 1 Introduction Discovery Example Table 1.1 (Continued) Case id Event id Properties Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact Moreover, e Fig. 1.5 The process model discovered representation followed by theofα-algorithm log shown [103] Case based id on the set of traces Trace { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, in Table 1.1: a = register a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, a,c,d,e,g } by f, g, or h. request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g casually, d = check ticket, After executing h, the case ends in the desired final marking 3 with just a token in a,c,d,e,f,b,d,e,g The repeated e = execution decide, f = reinitiate place end. Similarly, it can request, be checked g = that pay the other five traces 4 shown in Table 1.2 a,d,b,e,h are of also b possible or c, ind, the model and compensation, and e that suggests alland of these h = reject traces result 5 in the marking with a,c,d,e,f,d,c,e,f,c,d,e,h just a token in place end. request the presence of a loop. 6 a,c,d,e,g

27 e = decide, f = reinitiate request, g = pay compensation, and h = reject request 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 14 6 a,c,d,e,g 1 Introduction Discovery Example Table 1.1 (Continued) Case id Event id Properties Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact Fig. 1.5 The process model discovered representation by theofα-algorithm log shown [103] Case based id on the set of traces Trace { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, in Table 1.1: a = register a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, a,c,d,e,g } These characteristics request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g are adequately casually, captured d = check ticket, After executing h, the case ends in the desired final marking 3 with just a token in a,c,d,e,f,b,d,e,g e = decide, f = reinitiate place end. Similarly, by itthe can request, benet. checked g = that pay the other five traces 4 shown in Table 1.2 a,d,b,e,h are also possible in the model compensation, and that alland of these h = reject traces result 5 in the marking with a,c,d,e,f,d,c,e,f,c,d,e,h just a token in place end. request 6 a,c,d,e,g

28 Overfitting and Underfitting One of the challenges of process mining is to balance between overfitting (the model is too specific and only allows for the accidental behavior observed) and underfitting (the model is too general and allows for behavior unrelated to the behavior observed). 28

29 Discussion The Petri net shown also allows for traces not in the log. For example, other possible traces are <a, d, c, e, f, b, d, e, g> and <a, c, d, e, f, c, d, e, f, c, d, e, f, c, d, e, f, b, d, e, g> This is a desired phenomenon as the goal is not to represent just the particular set of example traces in the event log. Process mining algorithms need to generalize the behavior contained in the log to show the most likely underlying model that is not invalidated by the next set of observations 29

30 e = decide, f = reinitiate request, g = pay compensation, and h = reject request 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 14 6 a,c,d,e,g 1 Introduction Discovery Example Table 1.1 (Continued) Case id Event id Properties Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact Fig. 1.5 The process model discovered representation by theofα-algorithm log shown [103] Case based id on the set of traces Trace { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, in Table 1.1: a = register a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, log a,c,d,e,g } and the model, there request, b = examine 1 a,b,d,e,h seems to be thoroughly, a good c = examine 2 a,d,c,e,g casually, d = check ticket, After executing h, the case ends in the desired final marking 3 with just a token in a,c,d,e,f,b,d,e,g balance between e = decide, f = reinitiate place end. Similarly, it can request, checked g = that pay the other five traces 4 shown in Table 1.2 a,d,b,e,h are also possible overfitting in the model compensation, and and that alland of these h = reject traces result 5 in the marking with a,c,d,e,f,d,c,e,f,c,d,e,h just a token in place end. request underfitting. 6 a,c,d,e,g When comparing the event

31 Another Discovery 14 1 Introduction Table 1.1 (Continued) Example Case id Event id Properties 1.4 Analyzing an Example Log 15 Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact Fig. 1.6 The process model representation discovered byofthe logα-algorithm shown Case based id on Cases 1 and 4, i.e., Trace the set of traces { a,b,d,e,h, a,d,b,e,h } in Table 1.1: a = register request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g The Petri net shown incasually, Fig. 1.5 d = also check allows ticket, for traces 3 not present in Table a,c,d,e,f,b,d,e,g 1.2.For e = decide, f = reinitiate example, the traces a,d,c,e,f,b,d,e,g request, g = pay and a,c,d,e,f,c,d,e,f,c,d,e,f,c, 4 a,d,b,e,h d,e,f,b,d,e,g are also compensation, possible. and This h = is reject a desired 5 phenomenon as the a,c,d,e,f,d,c,e,f,c,d,e,h goal is not to represent just the request particular set of example 6 traces in the event log. a,c,d,e,g Process mining algorithms need to generalize the behavior... contained in the log to... show the most likely underlying model that is not invalidated by the next set of observations. 31 One of the challenges of process mining is to balance between overfitting (the

32 Mining Other Models We used Petri nets to represent the discovered process models, because Petri nets are a succinct way of representing processes and have unambiguous but intuitive semantics. However, some mining techniques are independent of the 1.2 Limitations of Modeling 5 desired representation. Fig. 1.2 The same process modeled in terms of BPMN 32

33 14 1 Introduction Table 1.1 (Continued) Case id Event id Properties Conformance Example Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Table 1.2 A more compact representation of log shown in Table 1.1: a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request Case id Trace 16 1 Introduction 1 a,b,d,e,h 2 a,d,c,e,g Table 1.3 Another event log: Cases 7, 8, and 10 are not possible according to Fig a,c,d,e,f,b,d,e,g 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 6 a,c,d,e,g Fig. 1.5 The process model discovered by the α-algorithm [103] based on the set of traces { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, a,c,d,e,g } After executing h, the case ends in the desired final marking with just a token in Case id 33 Trace 1 a,b,d,e,h 2 a,d,c,e,g 3 a,c,d,e,f,b,d,e,g 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 6 a,c,d,e,g 7 a, b, e, g 8 a, b, d, e 9 a,d,c,e,f,d,c,e,f,b,d,e,h 10 a, c, d, e, f, b, d, g

34 14 1 Introduction Table 1.1 (Continued) Case id Event id Properties Conformance Example Timestamp Activity Resource Cost :15.02 Register request Mike :16.06 Examine casually Ellen :16.22 Check ticket Mike :16.52 Decide Sara :11.47 Pay compensation Mike Analyzing an Example Log Table 1.2 A more compact representation of log shown Case id Trace in Table 1.1: a = register request, b = examine 1 a,b,d,e,h thoroughly, c = examine 2 a,d,c,e,g casually, d = check ticket, Table 1.3 Another event log: 3 a,c,d,e,f,b,d,e,g e = decide, f = reinitiate request, g = pay 4 Cases 7, 8, and a,d,b,e,h 10 are not compensation, Fig. 1.6 The process and h = model rejectdiscovered 5 possible by the α-algorithm according based a,c,d,e,f,d,c,e,f,c,d,e,h on Cases to1fig. and 4, i.e., 1.5the set of request traces { a,b,d,e,h, a,d,b,e,h } 6 a,c,d,e,g 16 1 Introduction The Petri net shown in Fig. 1.5 also allows for traces not present in Table 1.2.For example, the traces a,d,c,e,f,b,d,e,g and a,c,d,e,f,c,d,e,f,c,d,e,f,c, d,e,f,b,d,e,g are also possible. This is a desired phenomenon as the goal is not to represent just the particular set of example traces in the event log. Process mining algorithms need to generalize the behavior contained in the log to show the most likely underlying model that is not invalidated by the next set of observations. One of the challenges of process mining is to balance between overfitting (the model is too specific and only allows for the accidental behavior observed) and underfitting (the model is too general and allows for behavior unrelated to the behavior observed). When comparing the event log and the model, there seems to be a good balance between overfitting and underfitting. All cases start with a and end with either g or h. Every e is preceded by d and one of the examination activities (b or c). Moreover, e is followed by f, g, orh. The repeated execution of b or c, d, and e suggests the presence of a loop. These characteristics are adequately captured by Fig. 1.5 The process model discovered by the α-algorithm [103] based on the set of traces the net of Fig { a,b,d,e,h, a,d,c,e,g, a,c,d,e,f,b,d,e,g, a,d,b,e,h, a,c,d,e,f,d,c,e,f,c,d, e,h, Let a,c,d,e,g } us now consider an event log consisting of only two traces a,b,d,e,h and a,d,b,e,h, i.e., Cases 1 and 4 of the original log. For this log, the α-algorithm constructs the Petri net shown in Fig This model only allows for two traces After and these executing are exactly h, the the case ones ends in the in the small desired eventfinal log. bmarking and d are with modeled just a token as being Case id 34 Trace 1 a,b,d,e,h 2 a,d,c,e,g 3 a,c,d,e,f,b,d,e,g 4 a,d,b,e,h 5 a,c,d,e,f,d,c,e,f,c,d,e,h 6 a,c,d,e,g 7 a, b, e, g 8 a, b, d, e 9 a,d,c,e,f,d,c,e,f,b,d,e,h 10 a, c, d, e, f, b, d, g

35 Process Discovery: α-algorithm 35

36 Process Discovery Process discovery is the activity that combines Discovery with the Control-flow Perspective. The general problem: A process discovery algorithm is a function that maps an event log L onto a process model M such that the model M is representative for the behavior seen in the event log L. We focus on simple event logs and Petri net models (possibly sound workflow nets). 36

37 etri net that can replay event log L 1. Ideally, the Petri net is a sound WF-net efined in Sect Based on these choices, we reformulate the process discove roblem and make it more concrete. Simple Event Log efinition 5.2 (Specific process discovery problem) A process discovery algorith s a function γ that maps a log L B(A ) onto F-net 1 discovered for L 1 =[ a,b,c,d 3 a marked Petri, a,c,b,d 2 net γ (L) = (N, M deally, N is a sound WF-net and all traces in L correspond, a,e,d ] to possible firing s uences of (N, M). Let A be a set of activities. make things more concrete, we define the target to be a Petri ne A simple trace over A is a finite sequence of activities. Function γ defines a so-called Play-in technique as described in Chap. 1. Base, nwe L 1, ause process a simple discovery event algorithm log as γ could input discover (cf. Definition the WF-net shown 4.4). in AFig. simp 5..e., γ (L 1 A ) = simple (N 1, [start]). event Each log trace over ina Lis 1 corresponds a multiset to of atraces. possible firing s uence of WF-net N 1 shown in Fig Therefore, it is easy to see that the WF-n an indeed replay all traces [ in the event log. In fact, each of the three L 1 = a,b,c,d 3, a,c,b,d 2 ] possible firin equences of WF-net N 1 appears in L 1., a,e,d Let us now consider another event log: multi-set of traces over some set of activities A, i.e., L B(A ple log describing the history of six cases. The goal is now to di L 2 = [ a,b,c,d 3, a,c,b,d 4, a,b,c,e,f,b,c,d 2, a,b,c,e,f,c,b,d, a,c,b,e,f,b,c,d 2, a,c,b,e,f,b,c,e,f,c,b,d ] hat can replay event log L 1. Ideally, the Petri net is a sound W Sect Based on these choices, we reformulate the process d nd make it more concrete is a simple event log consisting of 13 cases represented by 6 different trace

38 Challenges 5.4 Challenges 151 Simple structure Other behaviours allowed No completely unrelated behaviour Fig Balancing the four quality dimensions: fitness, simplicity, precision, and generalization made. For example: What is the penalty if 38 a step needs to be skipped and what is the penalty if tokens remain in the WF-net after replay? Later, we will give concrete

39 Appropriateness 5.4 Challenges 39 Fig Balancing the four quality dimensions: fitness, simplicity, precision, and gene

40 α-algorithm The α-algorithm was one of the first process discovery algorithms that could adequately deal with concurrency. It has several limitations, but it provides a good introduction into the topic: The α-algorithm is simple and many of its ideas have been embedded in more complex and robust techniques. The α-algorithm scans the event log for particular patterns, called log-based ordering relations, to create a footprint of the log. 40

41 (c, d), (e, d) d # L1 L1 b L1 # L1 L1 L1 # L1 c L1 L1 # L1 L1 # L1 Log-based Ordering d # L1 L1 L1 # L1 L1 e L1 # L1 # L1 L1 # L1 Relations Definition 5.3 (Log-based ordering relations) L L B(A B(A ). Let a,b ). Let A : a,b A : Let L be an event log over A, i.e., a,b,c,d. a> L b if and onlyhowever, if there is a trace d σ = t 1,t 2,t 3,...,t n and i {1,...,n 1} such that σ L and t i = a and t i+1 = b L1 c because c never hea log. such L b if and that L1 onlyσ contains if a> L band ball t i L = pairs a a andoft i+1 activities = b in a # L b if and only if a L b and b L a a a L b if and L b if and only if a> only if a> L b and b> L a L b and b L a Consider for instance L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] again. For this and a sometimes event log, the L b if and only the other if a> way following log-based ordering L b and around. b> relations can be found L b a# L1 e > L1 = { (a, b), (a, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) } L1 = { (a, b), (a, c), (a, e), (b, d), (c, d), (e, d) } e L1 # L1, (c, c), (c, e), (d, a), (d, d), (e, b), (e, c), (e, e) } Definition 5.3 (Log-based ordering relations) tivities in a directly follows relation. c> L1 d Let L b a> L b if and only if there is a trace σ = t 1,t 2,t 3,.. d because sometimes d directly follows c and d and a # L d b if L1 and c). only b L1 if ac because L b andb> b L1L ac and Consider for instance L 1 =[ a,b,c,d 3, a,c,b,d A : x L y, y L x, x # L y,orx L y, event log, the following log-based ordering relations ca holds 41 # L1 = { for any pair of activities. Therefore, the (a, a), (a, d), (b, b), (b, e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c), (e, e) }

42 ,b A : c L1 L1 # L1 L1 # L1 e L1 # L1 # L1 L1 # L1 d # L1 L1 L1 # L1 L1 Log-based e L1 # L1 # L1 Ordering L1 # L1 nly if there is a trace σ = t 1,t 2,t 3,...,t n and i {1,..., Definition 5.3 (Log-based ordering ordering relations) relations) L Let a,b A : Land B(At i = ). Let a and a,b ta i+1 : = b Let L be an event log over A, i.e., Relations: Example a> L b if and only if there is a trace σ = t 1,t 2,t 3,...,t n and i {1,...,n 1} such that σ L and t i = a and t i+1 = b a L b if and only if a> L b and b L a a # L b if and only if a L b and b L a a L b if and only if a> L b and b> L a nly if a> L b and b L a > L1 = { (a, b), (a, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) } L1 = { (a, b), (a, c), (a, e), (b, d), (c, d), (e, d) } # L1 = { (a, a), (a, d), (b, b), (b, e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c), (e, e) } L1 = { (b, c), (c, b) } Relation > L1 contains all pairs of activities in a directly follows relation. c> L1 d because d directly follows c in trace a,b,c,d. However, d L1 c because c never directly follows d in any trace in the log. L1 contains all pairs of activities in a causality relation, e.g., c L1 d because sometimes d directly follows c and never the other way around (c > L1 d and d L1 c). b L1 c because b> L1 c and c> L1 b, i.e., sometimes c follows b and sometimes the other way around. b # L1 e, d), because (b, b L1 b), e and(b, e L1 b. e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c), For any log L over A and x,y A : x L y, y L x, x # L y,orx L y, 42 i.e., } precisely one of these relations holds for any pair of activities. Therefore, the footprint of a log can be captured in a matrix as shown in Table 5.1. Let L be an event log over A, i.e., a> L b if and only if there is a trace σ = t 1,t 2,t 3,...,t n and i {1,...,n 1} lysuch if athat σ L b Land t i b= a and L at i+1 = b a L b if and only if a> L b and b L a ly if a> a # Consider L b if and L b and b> for instance only Lif 1 =[ a,b,c,d a L b L a and 3, a,c,b,d b L a 2, a,e,d ] again. For this a event L log, b ifthe and following onlylog-based if a> ordering L b and relations b> can L abe found stance L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] again. F Consider for instance L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] again. For this event wing log, log-based the followingordering log-based ordering relations relations can canbe found > L1 = { (a, b), (a, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) }, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) } L1 = { (a, b), (a, c), (a, e), (b, d), (c, d), (e, d) }, c), (a, e), (b, d), (c, d), (e, d) } # L1 = { (a, a), (a, d), (b, b), (b, e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c), (e, e) } L1 = { (b, c), (c, b) } Relation > L1 contains all pairs of activities in a directly follows relation. c> L1 d

43 and t i = a and t i+1 = b only if a> L b and b L a Footprint Matrix: nly if a L b and b L a nly if a> L b and b> L a Example instance L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] again. F lowing log-based ordering relations can be found a, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) } a, c), (a, e), (b, d), (c, d), (e, d) } 5 Process Discovery: An Introduction a b c d e a # L1 L1 L1 # L1 L1 b L1 # L1 L1 L1 # L1 c L1 L1 # L1 L1 # L1 a, d), (b, b), (b, e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c) d } # L1 L1 L1 # L1 L1 c, b) e L1 # L1 # L1 L1 # L1 tains all pairs of activities in a directly follows relation. c 43

44 c,d 2, c,e,f,c, b # c # Patterns d # # e # # Footprints are f useful to # discover typical patterns of activities # in the corresponding process model 44

45 d # # # e # # # f # # Patterns Footprints are useful to discover typical patterns of activities in the corresponding process model 45

46 Patterns Footprints are useful to discover typical patterns of activities in the corresponding process model. 5.4 Typical process patterns and the footprints they leave in the event log 46

47 5.2.2 Algorithm After showing the basic idea and some examples, we describe the α-algorithm [103]. Definition 5.4 (α-algorithm) as follows. The Algorithm Let L be an event log over T A. α(l) is defined (1) T L ={t T σ L t σ } transitions (2) T I ={t T σ L t = first(σ )} start event (3) T O ={t T σ L t = last(σ )} end event (4) X L ={(A, B) A T L A = B T L B = a A b B a L b a1,a 2 A a 1 # L a 2 b1,b 2 B b 1 # L b 2 } decision point (5) Y L ={(A, B) X L (A,B ) X L A A B B = (A, B) = (A,B )} (6) P L ={p (A,B) (A, B) Y L } { i L,o L } places (7) F L ={(a, p (A,B) ) (A, B) Y L a A} { (p (A,B),b) (A, B) Y L b B} {(i L,t) t T I } {(t, o L ) t T O } arcs (8) α(l) = (P L,T L,F L ) net max decision point L is an event log over some set T of activities. In Step 1, it is checked which activities do appear in the log (T L ). These will correspond to the transitions of the generated WF-net. T I is the set of start activities, 47 i.e., all activities that appear first in some trace (Step 2). T O is the set of end activities, i.e., all activities that appear last in

48 The Core of the Algorithm: Steps 4, 5 How to identify L? Rearrange the lumns ing to,...,a m } and,...,b n } and other rows and m the footprint a 1 a 2... a m b 1 b 2... b n a 1 # #... #... a 2 # #... # a m # #... #... b 1... # #... # b 2... # #... # b n... # #... # consider L 1 again. Clearly, A ={a} and B ={b,e} 48 meet the requirements

49 The Core of the Algorithm: Step 4, 5 5 Process Discovery: An Introduction (A,B) sitions in set ns in set B to identify arrange the s a 1 a 2... a m b 1 b 2... b n 49

50 rows and columns corresponding to A ={a 1,a 2,...,a m } and B ={b 1,b 2,...,b n } and remove the other rows and columns from the footprint The Algorithm: 5 Process Discovery: Example An Introduction nt oflet L 1 : us consider L a L1 c, a 1 again. Clearly, b 2 A ={a} and... B ={b,e} # meet the # requirements... # b c d e stated in Step 4. Also A ={a}... and B... ={b}... meet... the same... requirements X L is... the set of all such pairs that meet the b a # n requirements just... mentioned. # In this # case:... # L1 L1 L1 # L1 L1 X b L1 # L1 L1 L1 # L1 L1 = {( {a}, {b} ), ( {a}, {c} ), ( {a}, {e} ), ( {a}, {b,e} ), ( {a}, {c,e} ), c L1 L1 # L1 L1 # Let us consider ( L L1 1 again. ) ( Clearly, ) ( A ={a} ) and ( B ={b,e} ) meet ( the requirements )} stated ind {b}, Step 4. Also # {d}, {c}, L1 A ={a} {d}, {e}, {d} and L1 B ={b}, {b,e}, meet L1 the same # {d}, {c,e}, L1 requirements. {d} L1 X L is the set of allesuch pairs that L1 meet the # L1 requirements # L1 just mentioned. L1 In this# case: If one would insert a place for any element in X L1 L1, there would be too many places. Therefore, only the X L1 = {( maximal {a}, {b} ), ( pairs {a}, {c} ) (A,, ( B) should {a}, {e} ), ( be included. {a}, {b,e} ), ( Note that {a}, {c,e} ) for any, pair (A, B) X L, nonempty ( ) ( set A ) A, ( and nonempty ) ( set ) B ( B, it)} is implied that (A,B ) X L. In {b}, Step {d} 5,, all {c}, nonmaximal Let {d} L, {e}, be {d} anpairs event, {b,e}, arelog removed, {d} over, {c,e}, A thus, {d} i.e., yielding: Log-based ordering relations) a,b A : b 2... # #... # a 1 # #... # a 2 # #... #... b n... # #... # a m # #... #... b 1... # #... # If one would Y insert a place for any element in X L1, there would be too many places. L1 = {( {a}, {b,e} ), ( {a}, {c,e} ), ( {b,e}, {d} ), ( {c,e}, {d} )} Therefore, only the maximal pairs (A, B) should be included. Note that for any pair (A, B) X L, nonempty set A A, and nonempty set B B, it is implied Step 5 can that (A,B also be understood in terms the footprint matrix. Consider Table 5.4 ) X L. In Step 5, all nonmaximal pairs are removed, thus yielding: only if there is a trace σ = t 1,t 2,t 3,...,t n and i {1,...,n 1} L and t i = a and t i+1 = b d only if a> L b and b L a only giovedì 12 dicembre if a 13 L b and b{( L a ) ( ) ( ) ( )} and let A and B be such that A 50 A and B B. Removing rows and columns A B \ (A B ) results in a matrix still having the pattern shown in

51 X L1 = {a}, {b}, {a}, {c}, {a}, {e}, {a}, {b,e}, {a}, {c,e}, ( ) ( ) ( ) ( ) ( )} {b}, {d}, {c}, {d}, {e}, {d}, {b,e}, {d}, {c,e}, {d} only if a> L b and b L a nly if a L b and b L a The Algorithm: Example nly if a> L b and b> L a If one would insert a place for any element in X L1, there would be too many places. Therefore, only the maximal pairs (A, B) should be included. Note that for any pair (A, B) X L, nonempty set A A, and nonempty set B B, it is implied that (A,B ) X L. In Step 5, all nonmaximal pairs are removed, thus yielding: instance L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] again. F lowing log-based ordering relations can be found Y L1 = {( {a}, {b,e} ), ( {a}, {c,e} ), ( {b,e}, {d} ), ( {c,e}, {d} )} Process Discovery: An Introduction Step 5 can also be understood in terms the footprint matrix. Consider Table 5.4 and let A and B be such that A A and B B. Removing rows and columns A B \ (A B ) results in a matrix still having the pattern shown in Table 5.4. Therefore, we only consider maximal matrices for constructing Y L. a, c), (a, e), (b, c), (c, b), (b, d), (c, d), (e, d) } a, c), (a, e), (b, d), (c, d), (e, d) } a, d), (b, b), (b, e), (c, c), (c, e), (d, a), (d, d), (e, b), (e, c) } c, b) Fig. 5.1 WF-net N 1 discovered for L 1 =[ a,b,c,d 3, a,c,b,d 2, a,e,d ] tains all pairs of activities in a directly follows relation. c ments. To make things more concrete, we define the target to be a Petri net model. Moreover, we use a simple event log as 51 input (cf. Definition 4.4). A simple event log L is a multi-set of traces over some set of activities A, i.e., L B(A ).For y follows c in trace a,b,c,d. However, d L c because

52 i.e., γ (L 1 ) = (N 1, [start]). Each trace in L 1 corresponds to a possible firing sequence of WF-net N 1 shown in Fig Therefore, it is easy to see that the WF-net can indeed replay all traces in the event log. In fact, each of the three possible firing sequences of WF-net N 1 appears in L 1. Let us now consider another event log: Other Examples L 2 = [ a,b,c,d 3, a,c,b,d 4, a,b,c,e,f,b,c,d 2, a,b,c,e,f,c,b,d, a,c,b,e,f,b,c,d 2, a,c,b,e,f,b,c,e,f,c,b,d ] Process Discovery 131 L 2 is a asimple bevent log c consisting d of e 13 cases f represented by 6 different traces. Based on event log L 2, some γ could discover WF-net N 2 shown in Fig This a # # # # WF-net can indeed replay all traces in the log. However, not all firing sequences of b # N 2 correspond to traces in L 2. For example, the firing sequence a,c,b,e,f,c,b,d c # does not appear in L 2. In fact, there are infinitely many firing sequences because of d # # # # the loop construct in N 2. Clearly, 5.1 these Problemcannot Statement all appear in the event log. Therefore, e # # # Definition 5.2 does not require all firing sequences of (N, M) to be traces in L. f # # # 52

53 g. 5.6 WF-net N 4 derived from L 4 =[ a,c,d 45, b,c,d 42, a,c,e 38, b,c,e 22 ] Other Examples L 3 = [ a,b,c,d,e,f,b,d,c,e,g, a,b,d,c,e,g 2, a,b,c,d,e,f,b,c,d,e,f,b,d,c,e,g ] d from L 3 =[ a,b,c,d,e,f,b,d,c,e,g, a,b,d,c,e,g 2, a,b,c,,g ] a b c d e f g e α-algorithm constructs WF-net N 3 based on L 3 (see Fig. 5.5). a # # # # # # Table b 5.3 shows # the footprint # of L 3. Note # that the patterns in the model inde atchc the# log-based # ordering relations # extracted # from the event log. Consider, d # # # # ample, the process fragment involving b, c, d, and e. Obviously, this fragm e # # # Process Discovery: An Introduction n f be constructed # # based# on b L3 # c, b# L3 d, c L3 d, c L3 e, and d L3 e g choice # following # # e is# revealed by # e # L3 f, e L3 g,andf # L3 g. Etc. Another example is shown in Fig WF-net N 4 can be derived from L 4 L 4 = [ a,c,d 45, b,c,d 42, a,c,e 38, b,c,e 22] from L 4 =[ a,c,d 45, b,c,d 42, a,c,e 38, b,c,e 22 ] Fig. 5.5 WF-net N 53 derived from L =[ a,b,c,d,e,f,b,d,c,e,g, a,b,d,c,e,g 2, a,b,c,

54 match the log-based ordering relations extracted from the event log. Consider, for example, the process fragment involving b, c, d, and e. Obviously, this fragment can be constructed based on b L3 c, b L3 d, c L3 d, c L3 e, and d L3 e. The choice following e is revealed by e L3 f, e L3 g,andf # L3 g. Etc. Another example is shown in Fig WF-net N 4 can be derived from L 4 Other Examples L 4 = [ Fig. 5.5 a,c,d 45 WF-net N 3 derived, b,c,d 42 from L 3 =[ a,b,c,d,e,f,b,d,c,e,g,, a,c,e 38, b,c,e 22] a,b,d,c,e,g 2, a, d,e,f,b,c,d,e,f,b,d,c,e,g ] Table 5.3 Footprint of L 3 a b c d e f a b c d e a # # # # b # # # # c # d # # # # e # # # # a # # # # # b # # c # # # d # # # e # # # f # # # # g # # # # # Fig. 5.6 WF-net N 4 derived 54 from L 4 =[ a,c,d 45, b,c,d 42, a,c,e 38, b,c,e 22 ]

55 3 4 shows that indeed α(l 3 ) = N 3 and α(l 4 ) = N 4. In Figs. 5.5 and 5.6, the p named based on the sets Y L3 and Y L4. Moreover, α(l 1 ) = N 1 and α(l 2 ) = ulo renaming of places (because different place names are used in Figs. 5.1 renaming of places (because different place names are used in Figs. 5.1 and 5. ese examples show that the α-algorithm is indeed able to discover WF-nets bas These examples show that the α-algorithm is indeed able to discover WF-n event logs. on event logs. Let us now consider event log L 5 : Other Examples Let us now consider event log L 5 : L 5 = [ a,b,e,f 2, a,b,e,c,d,b,f 3, a,b,c,e,d,b,f 2 a,b,c,d,e,b,f 4, a,e,b,c,d,b,f 3], a,b,c,d,e,b,f 4, a,e,b,c,d,b,f 3] r Process Discovery 135 a b c d e f le 5.5 shows the footprint of the log. T a # # # # I ={a} Let us now apply the 8 steps of the algorithm for L = L 5 : b # c # # # T L ={a,b,c,d,e,f} d # # # e # T I ={a} f # # # 5 Process Discovery: # An Introduction Fig. 5.8 WF-net N 5 derived from L 5 =[ a,b,e,f 2, a,b,e,c,d,b,f 3, a,b,c,e,d,b a,b,c,d,e,b,f 4, a,e,b,c,d,b,f 3 ] T I ={f } L 5 = [ a,b,e,f 2, a,b,e,c,d,b,f 3, a,b,c,e,d,b,f 2, Process Discovery: An Introdu Table 5.5 shows the footprint of the log. Let us now apply the 8 steps of the algorithm for L = L 5 : T L ={a,b,c,d,e,f} T I ={f } X L = {( {a}, {b} ), ( {a}, {e} ), ( {b}, {c} ), ( {b}, {f } ), ( {c}, {d} ), ( {d}, {b} ), ( {e}, {f } ), ( {a,d}, {b} ), ( {b}, {c,f } )} Y L = {( {a}, {e} ), ( {c}, {d} ), ( {e}, {f } ), ( {a,d}, {b} ), ( {b}, {c,f } )} P L = { p ({a},{e}),p ({c},{d}),p ({e},{f }),p ({a,d},{b}),p ({b},{c,f }),i L,o L } B) Y L corresponds to a place p (A,B) connecting transi- In addition, P L X L = {( also contains a unique {a}, {b} ), ( source place {a}, {e} ) i L, ( and F L = { (a, {b}, {c} ) p, ( ({a},{e}) ), (p ({a},{e}) {b}, {f } ), (, e), (c, p ({c},{d}) {c}, {d} ) ), (p ({c},{d}), d), f. Step 6). Remember that the goal is to create a WF-net. 1 Nevertheless, 1 the (e, α-algorithm p may construct a Petri net that is not a WF-net (see, fo ({e},{f }) ), (p ({e},{f }), f ), (a, p ({a,d},{b}) ), (d, p the WF-net are generated. All start transitions in T I have, ({a,d},{b}) ), Fig. 5.12). Later, we will discuss such problems in detail. (p all end transitions( T O have o L as ) output ( place. All ) places ({a,d},{b}), b), (b, p ({b},{c,f }) ), (p ({b},{c,f }), c), (p ({b},{c,f }), f ), ( ) ( )} odes and B as output {d}, nodes. {b} The result, {e}, is a Petri {f net } α(l), {a,d}, = (i es the behavior {b} L, a),, (f, o {b}, L ) } {c,f } 55 Y L = {( seen in event log {a}, {e} ) L. four logs and four WF-nets. Application, ( of the α-algorithm {c}, {d} ), ( α(l) = (P {e}, {f } ) L,T ( L,F L ) {a,d}, {b} ), ( {b}, {c,f } )} 5.8 WF-net N = N 3 and α(l 5 derived from L 4 ) = N 4. In 5 =[ a,b,e,f Figs. 5.5 and 2, a,b,e,c,d,b,f 5.6, the places 3, a,b,c,e,d,b,f are 2,,c,d,e,b,f 4, a,e,b,c,d,b,f 3 ] Figure 5.8 shows N 5 = α(l 5 ), i.e., the model just computed. N 5 can in

56 tly followed by b. Consequently, a footprint like the one shown in Table 5.5 ed to be valid. Limitation: We revisit the notion of completeness Implicit later in this chapter. en if we assume that the log is complete, the α-algorithm has some problem e are many different WF-nets that have the same possible behavior, i.e., tw ls can be structurally differentplaces but trace equivalent. Consider, for instance, th ing event log: L 6 = [ a,c,e,g 2, a,e,c,g 3, b,d,f,g 2, b,f,d,g 4] 5.2 A Simple Algorithm for Process Discovery 137 ) is shown in Fig Although the model is able to generate the observe ior, the resulting WF-net is needlessly complex. Two of the input places of dundant, i.e., they can be removed without changing the behavior. The place ted as p 1 and p 2 are so-called implicit places and can be removed witho Fig. 5.9 WF-net N 6 derived from L 6 =[ a,c,e,g 2, a,e,c,g 3, b,d,f,g 2, b,f,d,g 4 ]. The two highlighted places are redundant, i.e., removing them will simplify the model without changing its behavior p1 and p2 are redundant Fig Incorrect WF-net N 7 derived from L 7 =[ a,c 2, a,b,c 3, 56

57 possible trace equivalent WF-nets. e original α-algorithm (as presented in Sect ) has problems dealing loops, i.e., loops of length one or two. For a loop of length one, this Limitation: Short Loop ed by WF-net N 7 in Fig. 5.10, which shows the result of applying the b thm to L 7. t N 6 derived from L 6 =[ a,c,e,g 2, a,e,c,g 3, b,d,f,g 2, b,f,d,g 4 ]. The places are redundant, i.e., removing them will simplify the model without changing L 7 = [ a,c 2, a,b,c 3, a,b,b,c 2, a,b,b,b,b,c 1] e resulting model is not a WF-net as transition b is disconnected from the rect WF-net model. The models allows for the execution of b before a and after c. Th nsistent,b,c 3, with the event log. This problem can be addressed easily as show b,b, sing an improved version of the Fig. α-algorithm, 5.9 WF-net N 6 derived from L 6 =[ a,c,e,g one 2, can a,e,c,g 3 discover, b,d,f,g 2, b,f,d,g 4 the ]. The WF two highlighted places are redundant, i.e., removing them will simplify the model without changing in Fig e problem with loops of length two is illustrated by Petri net N 8 in Fig. L 7 =[ a,c 2, a,b,c 3, et N 7 having rt-loop of 5.2 A Simple Algorithm for Process Discovery 137 its behavior Fig Incorrect WF-net N 7 derived from a,b,b,c 2, a,b,b, b,b,c 1 ] b is disconnected from the model shows the result of applying the basic algorithm to L 8. [ L 8 = a,b,d 3 Fig WF-net, a,b,c,b,d 2 N ] 7 having a so-called short-loop of length one, a,b,c,b,c,b,d Expected net: 57 affecting the set of possible firing sequences. In fact, Fig. 5.9 shows only one of many possible trace equivalent WF-nets. The original α-algorithm (as presented in Sect ) has problems dealing with

58 ig blem with loops of length two is illustrated by Petri net N 8 in Fi Limitation: Short Loop ws the result of applying the basic algorithm to L 8. L 8 = [ a,b,d 3, a,b,c,b,d 2, a,b,c,b,c,b,d ] Fig Corrected WF-net N 8 having a so-called short-loop of length two Process Discovery: An Introduction rrected WF-net N 8 having a so-called short-loop of length two og: a L8 b, that The b and following c log-based ordering relations are derived from this event log: a L8 b, g b is L8 not d,andb L8 c. Hence, the basic algorithm incorrectly assumes that b and c ng log-based he extension are in parallel ordering because relations they follow are derived one another. from this The event model log: shown a in L8 Fig. b, 5.12 is not ndb F-net even L8 shown a c. WF-net, Hence, because the basic c is algorithm not on a path incorrectly from source assumes to sink. that Using b and the c extension lel because describedthey Fig in [30], follow Incorrect theone WF-net improved another. N 8 derived α-algorithm The model from L 8 =[ a,b,d correctly shown in 3, a,b,c,b,d discovers Fig , the is not a,b,c,b,c,b,d ] WF-net shown net, able inbecause to Fig. deal c is not on a path from source to sink. Using the extension n [30], c is disconnected from the model lternatives There the improved to are various α-algorithm ways to improve correctlythe discovers basic α-algorithm the WF-netto shown be able to deal. t with loops. The The α + -algorithm described in [30] is one of several alternatives to re various phase address ways deals problems to improve related tothethebasic original α-algorithm topresented be able in to Sect. deal The The ps α of + α length -algorithm + -algorithm usesdescribed a pre andin postprocessing [30] is one of phase. several The alternatives preprocessing tophase deals blems withrelated loops of tolength the original two whereas algorithm the preprocessing presented phase Sect. inserts The loops of length Expected net: m uses or more. one. a pre and postprocessing phase. The preprocessing phase deals For Fig Corrected WF-net N 8 having a so-called short-loop two rency of length can Thebe two basicwhereas algorithmthehas preprocessing no problems phase mininginserts loops of loops length of three lengthor more. For 58 Fig Corrected WF-net N 8 having a so-called short-loop of length two L b, a b> loop of L c, involving at least three activities (say a, b, and c), concurrency can be Process Discovery: An In Fig Incorrect WF-net N 8 derived from L 8 =[ a,b,d 3, a,b,c,b,d 2, a,b,c,b

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Roberto Bruni http://www.di.unipi.it/~bruni 24 - Process Mining 1 Object We overview the key principles of process

More information

Chapter 5 Process Discovery: An Introduction

Chapter 5 Process Discovery: An Introduction Chapter 5 Process Discovery: An Introduction Process discovery is one of the most challenging process mining tasks. Based on an event log, a process model is constructed thus capturing the behavior seen

More information

Data Science. Research Theme: Process Mining

Data Science. Research Theme: Process Mining Data Science Research Theme: Process Mining Process mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand and process modeling and

More information

Using Process Mining to Bridge the Gap between BI and BPM

Using Process Mining to Bridge the Gap between BI and BPM Using Process Mining to Bridge the Gap between BI and BPM Wil van der alst Eindhoven University of Technology, The Netherlands Process mining techniques enable process-centric analytics through automated

More information

Process Mining: Making Knowledge Discovery Process Centric

Process Mining: Making Knowledge Discovery Process Centric Process Mining: Making Knowledge Discovery Process Centric Wil van der alst Department of Mathematics and Computer Science Eindhoven University of Technology PO Box 513, 5600 MB, Eindhoven, The Netherlands

More information

Process Mining Data Science in Action

Process Mining Data Science in Action Process Mining Data Science in Action Wil van der Aalst Scientific director of the DSC/e Dutch Data Science Summit, Eindhoven, 4-5-2014. Process Mining Data Science in Action https://www.coursera.org/course/procmin

More information

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT. Wil M.R van der Aalst Process Mining Discovery, Conformance and Enhancement of Business Processes Q UNIVERS1TAT m LIECHTENSTEIN Bibliothek ^J Springer Contents 1 Introduction I 1.1 Data Explosion I 1.2

More information

Process Modelling from Insurance Event Log

Process Modelling from Insurance Event Log Process Modelling from Insurance Event Log P.V. Kumaraguru Research scholar, Dr.M.G.R Educational and Research Institute University Chennai- 600 095 India Dr. S.P. Rajagopalan Professor Emeritus, Dr. M.G.R

More information

Summary and Outlook. Business Process Intelligence Course Lecture 8. prof.dr.ir. Wil van der Aalst. www.processmining.org

Summary and Outlook. Business Process Intelligence Course Lecture 8. prof.dr.ir. Wil van der Aalst. www.processmining.org Business Process Intelligence Course Lecture 8 Summary and Outlook prof.dr.ir. Wil van der Aalst www.processmining.org Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and

More information

Master Thesis September 2010 ALGORITHMS FOR PROCESS CONFORMANCE AND PROCESS REFINEMENT

Master Thesis September 2010 ALGORITHMS FOR PROCESS CONFORMANCE AND PROCESS REFINEMENT Master in Computing Llenguatges i Sistemes Informàtics Master Thesis September 2010 ALGORITHMS FOR PROCESS CONFORMANCE AND PROCESS REFINEMENT Student: Advisor/Director: Jorge Muñoz-Gama Josep Carmona Vargas

More information

Process Mining and Visual Analytics: Breathing Life into Business Process Models

Process Mining and Visual Analytics: Breathing Life into Business Process Models Process Mining and Visual Analytics: Breathing Life into Business Process Models Wil M.P. van der Aalst 1, Massimiliano de Leoni 1, and Arthur H.M. ter Hofstede 1,2 1 Eindhoven University of Technology,

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining or Data Analysis and Knowledge Discovery a.k.a. Data Mining II An insider s view Geoff Holmes: WEKA founder Process Mining

More information

Implementing Heuristic Miner for Different Types of Event Logs

Implementing Heuristic Miner for Different Types of Event Logs Implementing Heuristic Miner for Different Types of Event Logs Angelina Prima Kurniati 1, GunturPrabawa Kusuma 2, GedeAgungAry Wisudiawan 3 1,3 School of Compuing, Telkom University, Indonesia. 2 School

More information

Business Intelligence and Process Modelling

Business Intelligence and Process Modelling Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 7: Network Analytics & Process Modelling Introduction BIPM Lecture 7: Network Analytics & Process Modelling Introduction

More information

Process Mining. Data science in action

Process Mining. Data science in action Process Mining. Data science in action Julia Rudnitckaia Brno, University of Technology, Faculty of Information Technology, irudnickaia@fit.vutbr.cz 1 Abstract. At last decades people have to accumulate

More information

Process Mining Using BPMN: Relating Event Logs and Process Models

Process Mining Using BPMN: Relating Event Logs and Process Models Noname manuscript No. (will be inserted by the editor) Process Mining Using BPMN: Relating Event Logs and Process Models Anna A. Kalenkova W. M. P. van der Aalst Irina A. Lomazova Vladimir A. Rubin Received:

More information

Mining Process Models with Non-Free-Choice Constructs

Mining Process Models with Non-Free-Choice Constructs Mining Process Models with Non-Free-hoice onstructs Lijie Wen 1, Wil M.P. van der alst 2, Jianmin Wang 1, and Jiaguang Sun 1 1 School of Software, Tsinghua University, 100084, eijing, hina wenlj00@mails.tsinghua.edu.cn,{jimwang,sunjg}@tsinghua.edu.cn

More information

Chapter 4 Getting the Data

Chapter 4 Getting the Data Chapter 4 Getting the Data prof.dr.ir. Wil van der Aalst www.processmining.org Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II:

More information

Business Process Modeling

Business Process Modeling Business Process Concepts Process Mining Kelly Rosa Braghetto Instituto de Matemática e Estatística Universidade de São Paulo kellyrb@ime.usp.br January 30, 2009 1 / 41 Business Process Concepts Process

More information

Business Process Quality Metrics: Log-based Complexity of Workflow Patterns

Business Process Quality Metrics: Log-based Complexity of Workflow Patterns Business Process Quality Metrics: Log-based Complexity of Workflow Patterns Jorge Cardoso Department of Mathematics and Engineering, University of Madeira, Funchal, Portugal jcardoso@uma.pt Abstract. We

More information

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010 ProM 6 Exercises J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl August 2010 The exercises provided in this section are meant to become more familiar with ProM

More information

Unique column combinations

Unique column combinations Unique column combinations Arvid Heise Guest lecture in Data Profiling and Data Cleansing Prof. Dr. Felix Naumann Agenda 2 Introduction and problem statement Unique column combinations Exponential search

More information

Process Mining: A Two-Step Approach using Transition Systems and Regions

Process Mining: A Two-Step Approach using Transition Systems and Regions Process Mining: Two-Step pproach using Transition Systems and Regions Wil M.P. van der alst 1, V. Rubin 2,1,.F. van ongen 1,. Kindler 2, and.w. Günther 1 1 indhoven University of Technology, indhoven,

More information

Discovering process models from empirical data

Discovering process models from empirical data Discovering process models from empirical data Laura Măruşter (l.maruster@tm.tue.nl), Ton Weijters (a.j.m.m.weijters@tm.tue.nl) and Wil van der Aalst (w.m.p.aalst@tm.tue.nl) Eindhoven University of Technology,

More information

Analysis of Service Level Agreements using Process Mining techniques

Analysis of Service Level Agreements using Process Mining techniques Analysis of Service Level Agreements using Process Mining techniques CHRISTIAN MAGER University of Applied Sciences Wuerzburg-Schweinfurt Process Mining offers powerful methods to extract knowledge from

More information

Model Discovery from Motor Claim Process Using Process Mining Technique

Model Discovery from Motor Claim Process Using Process Mining Technique International Journal of Scientific and Research Publications, Volume 3, Issue 1, January 2013 1 Model Discovery from Motor Claim Process Using Process Mining Technique P.V.Kumaraguru *, Dr.S.P.Rajagopalan

More information

Genetic Process Mining: An Experimental Evaluation

Genetic Process Mining: An Experimental Evaluation Genetic Process Mining: An Experimental Evaluation A.K. Alves de Medeiros, A.J.M.M. Weijters and W.M.P. van der Aalst Department of Technology Management, Eindhoven University of Technology P.O. Box 513,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Research Motivation In today s modern digital environment with or without our notice we are leaving our digital footprints in various data repositories through our daily activities,

More information

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective BIS 3106: Business Process Management Lecture Two: Modelling the Control-flow Perspective Makerere University School of Computing and Informatics Technology Department of Computer Science SEM I 2015/2016

More information

Process Mining The influence of big data (and the internet of things) on the supply chain

Process Mining The influence of big data (and the internet of things) on the supply chain September 16, 2015 Process Mining The influence of big data (and the internet of things) on the supply chain Wil van der Aalst www.vdaalst.com @wvdaalst www.processmining.org http://www.engineersjournal.ie/factory-of-thefuture-will-see-merging-of-virtual-and-real-worlds/

More information

ProM Framework Tutorial

ProM Framework Tutorial ProM Framework Tutorial Authors: Ana Karla Alves de Medeiros (a.k.medeiros@.tue.nl) A.J.M.M. (Ton) Weijters (a.j.m.m.weijters@tue.nl) Technische Universiteit Eindhoven Eindhoven, The Netherlands February

More information

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Methods for the specification and verification of business processes MPB (6 cfu, 295AA) Roberto Bruni http://www.di.unipi.it/~bruni 19 - Event-driven process chains 1 Object We overview EPC and the main

More information

Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data

Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data Master of Business Information Systems, Department of Mathematics and Computer Science Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data Master Thesis Author: ing. Y.P.J.M.

More information

Process Mining and Network Analysis

Process Mining and Network Analysis Towards Comprehensive Support for Organizational Mining Minseok Song and Wil M.P. van der Aalst Eindhoven University of Technology P.O.Box 513, NL-5600 MB, Eindhoven, The Netherlands. {m.s.song, w.m.p.v.d.aalst}@tue.nl

More information

BUsiness process mining, or process mining in a short

BUsiness process mining, or process mining in a short , July 2-4, 2014, London, U.K. A Process Mining Approach in Software Development and Testing Process: A Case Study Rabia Saylam, Ozgur Koray Sahingoz Abstract Process mining is a relatively new and emerging

More information

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of

More information

BRIDGING THE GAP BETWEEN BUSINESS MODELS AND WORKFLOW SPECIFICATIONS

BRIDGING THE GAP BETWEEN BUSINESS MODELS AND WORKFLOW SPECIFICATIONS International Journal of Cooperative Information Systems c World Scientific Publishing Company BRIDGING THE GAP BETWEEN BUSINESS MODELS WORKFLOW SPECIFICATIONS JULIANE DEHNERT Fraunhofer ISST, Mollstr.

More information

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14 HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design

More information

Boolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010

Boolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010 Boolean Algebra (cont d) 2 Contents Multiplying out and factoring expressions Exclusive-OR and Exclusive-NOR operations The consensus theorem Summary of algebraic simplification Proving validity of an

More information

Decision Mining in Business Processes

Decision Mining in Business Processes Decision Mining in Business Processes A. Rozinat and W.M.P. van der Aalst Department of Technology Management, Eindhoven University of Technology P.O. Box 513, NL-5600 MB, Eindhoven, The Netherlands {a.rozinat,w.m.p.v.d.aalst}@tm.tue.nl

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Service Discovery from Observed Behavior While Guaranteeing Deadlock Freedom in Collaborations

Service Discovery from Observed Behavior While Guaranteeing Deadlock Freedom in Collaborations Service Discovery from Observed Behavior While Guaranteeing Deadlock Freedom in Collaborations Richard Müller 1,2, Christian Stahl 2, Wil M.P. van der Aalst 2,3, and Michael Westergaard 2,3 1 Institut

More information

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

More information

Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts and Algorithms 6 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data

More information

Business process measurement - data mining. enn@cc.ttu.ee

Business process measurement - data mining. enn@cc.ttu.ee Business process measurement - data mining. enn@cc.ttu.ee Business process measurement Balanced scorecard Process mining - ProM Äriprotsessi konteksti perspektiiv Clear & measurable goals Effective solutions

More information

Baltic Way 1995. Västerås (Sweden), November 12, 1995. Problems and solutions

Baltic Way 1995. Västerås (Sweden), November 12, 1995. Problems and solutions Baltic Way 995 Västerås (Sweden), November, 995 Problems and solutions. Find all triples (x, y, z) of positive integers satisfying the system of equations { x = (y + z) x 6 = y 6 + z 6 + 3(y + z ). Solution.

More information

Effective Pruning for the Discovery of Conditional Functional Dependencies

Effective Pruning for the Discovery of Conditional Functional Dependencies Effective Pruning for the Discovery of Conditional Functional Dependencies Jiuyong Li 1, Jiuxue Liu 1, Hannu Toivonen 2, Jianming Yong 3 1 School of Computer and Information Science, University of South

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

Pragmatic guidelines for Business Process Modeling

Pragmatic guidelines for Business Process Modeling Pragmatic guidelines for Business Process Modeling Moreno-Montes de Oca I, Snoeck M. KBI_1509 Pragmatic guidelines for Business Process Modeling Technical Report Isel Moreno-Montes de Oca Department of

More information

Process Mining and Fraud Detection

Process Mining and Fraud Detection Process Mining and Fraud Detection A case study on the theoretical and practical value of using process mining for the detection of fraudulent behavior in the procurement process Masters of Science Thesis

More information

Frequent item set mining

Frequent item set mining Frequent item set mining Christian Borgelt Frequent item set mining is one of the best known and most popular data mining methods. Originally developed for market basket analysis, it is used nowadays for

More information

Data Mining Association Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining

Data Mining Association Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/24

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Functional

More information

Chapter 2 Introduction to Business Processes, BPM, and BPM Systems

Chapter 2 Introduction to Business Processes, BPM, and BPM Systems Chapter 2 Introduction to Business Processes, BPM, and BPM Systems This chapter provides a basic overview on business processes. In particular it concentrates on the actual definition and characterization

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification

More information

Declaration of Conformity 21 CFR Part 11 SIMATIC WinCC flexible 2007

Declaration of Conformity 21 CFR Part 11 SIMATIC WinCC flexible 2007 Declaration of Conformity 21 CFR Part 11 SIMATIC WinCC flexible 2007 SIEMENS AG Industry Sector Industry Automation D-76181 Karlsruhe, Federal Republic of Germany E-mail: pharma.aud@siemens.com Fax: +49

More information

Unit 3 Boolean Algebra (Continued)

Unit 3 Boolean Algebra (Continued) Unit 3 Boolean Algebra (Continued) 1. Exclusive-OR Operation 2. Consensus Theorem Department of Communication Engineering, NCTU 1 3.1 Multiplying Out and Factoring Expressions Department of Communication

More information

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur

Module 10. Coding and Testing. Version 2 CSE IIT, Kharagpur Module 10 Coding and Testing Lesson 23 Code Review Specific Instructional Objectives At the end of this lesson the student would be able to: Identify the necessity of coding standards. Differentiate between

More information

Dr. Jana Koehler IBM Zurich Research Laboratory

Dr. Jana Koehler IBM Zurich Research Laboratory Precise Modeling of Business Processes with the Business Process Modeling Notation BPMN 2.0 Dr. Jana Koehler IBM Zurich Research Laboratory ZRL BIT at a Glance Computer Science at ZRL: Security/Cryptography

More information

Elements of Abstract Group Theory

Elements of Abstract Group Theory Chapter 2 Elements of Abstract Group Theory Mathematics is a game played according to certain simple rules with meaningless marks on paper. David Hilbert The importance of symmetry in physics, and for

More information

PROCESS-ORIENTED ARCHITECTURES FOR ELECTRONIC COMMERCE AND INTERORGANIZATIONAL WORKFLOW

PROCESS-ORIENTED ARCHITECTURES FOR ELECTRONIC COMMERCE AND INTERORGANIZATIONAL WORKFLOW Information Systems Vol.??, No.??, pp.??-??, 1999 Copyright 1999 Elsevier Sciences Ltd. All rights reserved Printed in Great Britain 0306-4379/98 $17.00 + 0.00 PROCESS-ORIENTED ARCHITECTURES FOR ELECTRONIC

More information

Chapter 12 Analyzing Spaghetti Processes

Chapter 12 Analyzing Spaghetti Processes Chapter 12 Analyzing Spaghetti Processes prof.dr.ir. Wil van der Aalst www.processmining.org Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

Process Mining Manifesto

Process Mining Manifesto Process Mining Manifesto manifesto is a "public declaration of principles and intentions" by a group of people. This manifesto is written by members and supporters of the IEEE Task Force on Process Mining.

More information

Data Mining Apriori Algorithm

Data Mining Apriori Algorithm 10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)

More information

Process Mining for Electronic Data Interchange

Process Mining for Electronic Data Interchange Process Mining for Electronic Data Interchange R. Engel 1, W. Krathu 1, C. Pichler 2, W. M. P. van der Aalst 3, H. Werthner 1, and M. Zapletal 1 1 Vienna University of Technology, Austria Institute for

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Just the Factors, Ma am

Just the Factors, Ma am 1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

More information

Fair testing vs. must testing in a fair setting

Fair testing vs. must testing in a fair setting Fair testing vs. must testing in a fair setting Tom Hirschowitz and Damien Pous Amsterdam, Novembre 2010 Laboratoire de Mathématiques Université de Savoie UMR 5127 Tom Hirschowitz and Damien Pous Fair

More information

Article. Abstract. This is a pre-print version. For the printed version please refer to www.wisu.de

Article. Abstract. This is a pre-print version. For the printed version please refer to www.wisu.de Article StB Prof. Dr. Nick Gehrke Nordakademie Chair for Information Systems Köllner Chaussee 11 D-25337 Elmshorn nick.gehrke@nordakademie.de Michael Werner, Dipl.-Wirt.-Inf. University of Hamburg Chair

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

More information

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

Index support for regular expression search. Alexander Korotkov PGCon 2012, Ottawa

Index support for regular expression search. Alexander Korotkov PGCon 2012, Ottawa Index support for regular expression search Alexander Korotkov PGCon 2012, Ottawa Introduction What is regular expressions? Regular expressions are: powerful tool for text processing based on formal language

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Kirsten Sinclair SyntheSys Systems Engineers

Kirsten Sinclair SyntheSys Systems Engineers Kirsten Sinclair SyntheSys Systems Engineers Kirsten Sinclair SyntheSys Systems Engineers Spicing-up IBM s Enterprise Architecture tools with Petri Nets On Today s Menu Appetiser: Background Starter: Use

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

A Business Process Services Portal

A Business Process Services Portal A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Supply Chain Management Use Case Model

Supply Chain Management Use Case Model Supply Chain Management Use Case Model Date: 2002/11/10 This version: http://www.ws-i.org/sampleapplications/supplychainmanagement/2002-11/scmusecases-0.18- WGD.htm Latest version: http://www.ws-i.org/sampleapplications/supplychainmanagement/2002-11/scmusecases-0.18-

More information

DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI)

DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Faculty of Business and Economics DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 1 An Improved Process Event Log Artificial Negative Event Generator Seppe K.L.M. vanden Broucke, Jochen

More information

Chapter 17. Orthogonal Matrices and Symmetries of Space

Chapter 17. Orthogonal Matrices and Symmetries of Space Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length

More information

Die Welt Multimedia-Reichweite

Die Welt Multimedia-Reichweite Die Welt Multimedia-Reichweite 1) Background The quantification of Die Welt s average daily audience (known as Multimedia-Reichweite, MMR) has been developed by Die Welt management, including the research

More information

Business Process Management Demystified: A Tutorial on Models, Systems and Standards for Workflow Management

Business Process Management Demystified: A Tutorial on Models, Systems and Standards for Workflow Management Business Process Management Demystified: A Tutorial on Models, Systems and Standards for Workflow Management Wil M.P. van der Aalst Department of Technology Management Eindhoven University of Technology

More information

Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

More information

EDIminer: A Toolset for Process Mining from EDI Messages

EDIminer: A Toolset for Process Mining from EDI Messages EDIminer: A Toolset for Process Mining from EDI Messages Robert Engel 1, R. P. Jagadeesh Chandra Bose 2, Christian Pichler 1, Marco Zapletal 1, and Hannes Werthner 1 1 Vienna University of Technology,

More information

Regular Languages and Finite Automata

Regular Languages and Finite Automata Regular Languages and Finite Automata 1 Introduction Hing Leung Department of Computer Science New Mexico State University Sep 16, 2010 In 1943, McCulloch and Pitts [4] published a pioneering work on a

More information

Sequence Partitioning for Process Mining with Unlabeled Event Logs

Sequence Partitioning for Process Mining with Unlabeled Event Logs Sequence Partitioning for Process Mining with Unlabeled Event Logs Micha l Walicki a,1, Diogo R. Ferreira b, a Institute of Informatics, University of Bergen, Norway b IST Technical University of Lisbon,

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

Chapter 8. Database Design II: Relational Normalization Theory

Chapter 8. Database Design II: Relational Normalization Theory Chapter 8 Database Design II: Relational Normalization Theory The E-R approach is a good way to start dealing with the complexity of modeling a real-world enterprise. However, it is only a set of guidelines

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

1. what is this talk about? Service-Oriented Architecture. The problem. Topic of this Talk. The solution

1. what is this talk about? Service-Oriented Architecture. The problem. Topic of this Talk. The solution 1. what is this talk about? the background Operating Guidelines for Oriented Architectures Wolfgang Reisig = id + control + interface Web : id = URI, interface = WSDL, Workflow : control = workflow Workflow

More information

Mining Configurable Process Models from Collections of Event Logs

Mining Configurable Process Models from Collections of Event Logs Mining Configurable Models from Collections of Event Logs J.C.A.M. Buijs, B.F. van Dongen, and W.M.P. van der Aalst Eindhoven University of Technology, The Netherlands {j.c.a.m.buijs,b.f.v.dongen,w.m.p.v.d.aalst}@tue.nl

More information

Discovering Social Networks from Event Logs

Discovering Social Networks from Event Logs Discovering Social Networks from Event Logs Wil M.P. van der Aalst 1,HajoA.Reijers 1, Minseok Song 2,1 1 Department of Technology Management, Eindhoven University of Technology, P.O.Box 513, NL-5600 MB,

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information