Learning Workflow Petri Nets



Similar documents
Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Regular Sets and Expressions

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Reasoning to Solve Equations and Inequalities

4.11 Inner Product Spaces

JaERM Software-as-a-Solution Package

EQUATIONS OF LINES AND PLANES

Math 135 Circles and Completing the Square Examples

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

Homework 3 Solutions

Factoring Polynomials

Protocol Analysis / Analysis of Software Artifacts Kevin Bierhoff

SPECIAL PRODUCTS AND FACTORIZATION

How To Network A Smll Business

All pay auctions with certain and uncertain prizes a comment

Integration. 148 Chapter 7 Integration

Small Business Networking

Small Business Networking

Small Business Networking

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

Small Business Networking

Solving BAMO Problems

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

Unambiguous Recognizable Two-dimensional Languages

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Graphs on Logarithmic and Semilogarithmic Paper

Vendor Rating for Service Desk Selection

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Experiment 6: Friction

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

One Minute To Learn Programming: Finite Automata

Econ 4721 Money and Banking Problem Set 2 Answer Key

and thus, they are similar. If k = 3 then the Jordan form of both matrices is


Bayesian Updating with Continuous Priors Class 13, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Integration by Substitution

Lecture 3 Gaussian Probability Distribution

Regular Languages and Finite Automata

How To Study The Effects Of Music Composition On Children

Binary Representation of Numbers Autar Kaw

How To Set Up A Network For Your Business

Operations with Polynomials

MATH 150 HOMEWORK 4 SOLUTIONS

Introducing Kashef for Application Monitoring

Helicopter Theme and Variations

Basic Analysis of Autarky and Free Trade Models

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

On decidability of LTL model checking for process rewrite systems

1.00/1.001 Introduction to Computers and Engineering Problem Solving Fall Final Exam

Modular Generic Verification of LTL Properties for Aspects

Virtual Machine. Part II: Program Control. Building a Modern Computer From First Principles.

6.2 Volumes of Revolution: The Disk Method

Generating In-Line Monitors For Rabin Automata

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

Assumption Generation for Software Component Verification

Online Multicommodity Routing with Time Windows

Learning to Search Better than Your Teacher

AntiSpyware Enterprise Module 8.5

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Modeling POMDPs for Generating and Simulating Stock Investment Policies

9 CONTINUOUS DISTRIBUTIONS

CHAPTER 11 Numerical Differentiation and Integration

When Simulation Meets Antichains (on Checking Language Inclusion of NFAs)

Section 7-4 Translation of Axes

Treatment Spring Late Summer Fall Mean = 1.33 Mean = 4.88 Mean = 3.

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

Novel Methods of Generating Self-Invertible Matrix for Hill Cipher Algorithm

Section 5-4 Trigonometric Functions

Network Configuration Independence Mechanism

Vectors Recap of vectors

Regular Repair of Specifications

Second Term MAT2060B 1. Supplementary Notes 3 Interchange of Differentiation and Integration


Efficient load-balancing routing for wireless mesh networks

Optiml Control of Seril, Multi-Echelon Inventory (E&I) & Mixed Erlng demnds

Economics Letters 65 (1999) macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Data replication in mobile computing

MODULE 3. 0, y = 0 for all y

Module Summary Sheets. C3, Methods for Advanced Mathematics (Version B reference to new book) Topic 2: Natural Logarithms and Exponentials

Enterprise Risk Management Software Buyer s Guide

Drawing Diagrams From Labelled Graphs

Source Code verification Using Logiscope and CodeReducer. Christophe Peron Principal Consultant Kalimetrix

Warm-up for Differential Calculus

Pentominoes. Pentominoes. Bruce Baguley Cascade Math Systems, LLC. The pentominoes are a simple-looking set of objects through which some powerful

Section 5.2, Commands for Configuring ISDN Protocols. Section 5.3, Configuring ISDN Signaling. Section 5.4, Configuring ISDN LAPD and Call Control

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning

Automated Grading of DFA Constructions

Neighborhood Based Fast Graph Search in Large Networks

4 Approximations. 4.1 Background. D. Levy

Performance analysis model for big data applications in cloud computing

Project 6 Aircraft static stability and control

Lecture 5. Inner Product

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

FDIC Study of Bank Overdraft Programs

AN ANALYTICAL HIERARCHY PROCESS METHODOLOGY TO EVALUATE IT SOLUTIONS FOR ORGANIZATIONS

NOTES. Cohasset Associates, Inc Managing Electronic Records Conference 8.1

Quality Evaluation of Entrepreneur Education on Graduate Students Based on AHP-fuzzy Comprehensive Evaluation Approach ZhongXiaojun 1, WangYunfeng 2

ASG Techniques of Adaptivity

Transcription:

Lerning Workflow Petri Nets Jvier Esprz, Mrtin Leucker, nd Mximilin Schlund Technische Universität München, Boltzmnnstr. 3, 85748 Grching, Germny {esprz,leucker,schlund}@in.tum.de Abstrct. Workflow mining is the tsk of utomticlly producing workflow model from set of event logs recording sequences of workflow events; ech sequence corresponds to use cse or workflow instnce. Forml pproches to workflow mining ssume tht the event log is complete (contins enough informtion to infer the workflow) which is often not the cse. We present lerning pproch tht relxes this ssumption: if the event log is incomplete, our lerning lgorithm utomticlly derives queries bout the executbility of some event sequences. If techer nswers these queries, the lgorithm is gurnteed to terminte with correct model. We provide mtching upper nd lower bounds on the number of queries required by the lgorithm, nd report on the ppliction of n implementtion to some exmples. 1 Introduction Modern workflow mngement systems offer modelling cpbilities to support business processes [vdavh04]. However, constructing forml or semi-forml workflow model of n existing business process is non-trivil tsk, nd for this reson workflow mining hs been extensively studied (see [vdavdh + 03] for survey). In this pproch, informtion bout the business processes is gthered in the form of logs recording sequences of workflow events, where ech sequence corresponds to use cse. The logs re then used to extrct forml model. Workflow mining techniques hve been implemented in severl systems, most prominently in the ProM tool [vdavdg + 07], nd successfully pplied. Most pproches to process mining use combintion of heuristics nd forml techniques, like mchine lerning or neurl networks, nd do not offer ny kind of gurntee bout the reltionship between the business process nd the mined model. Forml pproches hve been studied using workflow grphs [AGL98] nd workflow nets [vda98, BDLM07] s formlisms. These pproches ssume tht the logs provide enough informtion to infer the model, i.e., tht there is one single model comptible with them. In this cse we the cll the logs complete. This is strong ssumption, which often fils to hold, for two resons: first, the number of use cses my grow exponentilly in the number of tsks of the process, nd so my the size of complete set of logs. Second, mny processes hve corner cses : unusul process instnces tht rrely hppen. A complete set of logs must contin t lest one instnce of ech corner cse. In this pper we propose lerning technique to relx the completeness ssumption on the set of logs. In this pproch the model is produced by Lerner

tht my sk questions to Techer. The Lerner cn hve initil knowledge in the form of n initil set of logs; if the log contins enough informtion to infer the model, the Lerner produces it. If not, it itertively produces membership queries of the form: Does the business process hve n instnce ( use cse) strting with given sequence of tsks? For instnce, in the stndrd exmple of complint processing (see Figure 1 nd [vda98]), membership query could hve the form Is there use cse in which first the complint is registered nd then immeditely rejected? The Techer would nswer no, becuse decision on cceptnce or rejection is mde only fter the customer hs been sent questionnire. Notice tht the Lerner does not guess the queries, they re utomticlly constructed by the lerning lgorithm. Under the ssumption tht the Techer provides correct nswers, the lerning process is gurnteed to terminte with correct model: model whose executions coincide with the possible event sequences of the business process. In other words, we provide forml frmework with correctness nd completeness gurntee which only ssumes the existence of the Techer. It could be objected tht if Techer exists, then workflow model must lredy exist, nd there is no need to produce it. To see the flw in this rgument, observe tht the Techers cn be employees, dtbses of client records, etc, tht hve knowledge bout the process, but usully lck the modelling expertise required to produce forml model. Our lerning lgorithm only requires from the Techer the low-level bility to recognize given sequence of process ctions s the initil sequence of process ctions of some use cse. It is useful to drw n nlogy. Witnesses of crime cn usully nswer questions bout the physicl ppernce of the criminl, but they re very rrely ble to drw the criminl s portrit: this requires interction with police expert. This interction cn be seen s lerning process: the Techer is the witness, nd the Lerner is the police expert. The techer hs knowledge bout the criminl, but is unble to express it in the form of portrit. The Lerner hs the expertise required to produce portrit, but needs input from the Techer. In the context of business processes, Like [vda98, KRS06, BDLM07, RGvdA + 07], we use workflow nets, introduced by vn der Alst, s forml model of business processes. Loosely speking, workflow net is Petri net with distinguished initil nd finl mrking. Vn der Alst convincingly rgues tht well-formed business processes (n informl notion) correspond to sound workflow nets ( forml concept). A workflow net is sound [vda98] if it is live nd bounded. In this pper we follow vn der Alst s ides. Given Techer, we wish to lern sound workflow net for the business process. It is esy to come up with nive correct lerning lgorithm. However, first nive complexity nlysis yields tht the number of queries necessry to lern workflow net cn be triple exponentil in the number of tsks of the business process in the worst cse. This seems to indicte tht the pproch is useless. However, we show how the specil properties of sound workflow nets, together with finer complexity nlysis, led to WNL, new lerning lgorithm 2

requiring single exponentil number of queries in the worst cse. We lso provide n exponentil lower bound, showing tht WNL is symptoticlly optiml. Finlly, in number of experiments we show tht despite the exponentil worstcse complexity the lgorithm is ble to synthesize interesting workflows. Notice lso tht the complexity is nlysed for the cse in which no initil event log is provided, tht is, the cse in which ll knowledge hs to be extrcted from the Techer by sking membership queries. Techniclly, the triple exponentil complexity of the nive lgorithm is consequence of the following three fcts: () the size of deterministic finite utomton (DFA) recognizing the lnguge of net with n trnsitions cn be priori double exponentil in n; (b) lerning such DFA using only membership queries requires exponentilly mny queries in the size of the DFA (follows from [Ang87] nd [Vs73, Cho78]); nd (c) the lgorithms of Drondeu et l. for synthesis of Petri nets from regulr lnguges [BBD95] re exponentil in the size of the DFA. In the pper we solve () by proving tht the size of the DFA is only single exponentil; we solve (b) by exhibiting better lerning lgorithm for sound workflow nets requiring only polynomilly mny queries; finlly, we solve (c) by showing tht for sound workflow nets the lgorithms for synthesis of Petri nets from regulr lnguges cn be replced by the lgorithms for synthesis of bounded nets from miniml DFA, which re of polynomil complexity. Notice tht our solution very much profits from the restriction to sound workflow nets, but tht this restriction is given by the ppliction domin: tht sound workflow nets re n dequte formliztion of well-formed business processes hs been proved by the lrge success of the model in both the workflow modelling nd Petri net communities. Outline In the next section, we fix the nottion of utomt, recll the notion of Petri nets nd workflow nets, nd cite results on synthesis of Petri nets from utomt. Our lerning lgorithm WNL is elborted in Section 3. Section 4 reports on our implementtion nd experimentl results. Finlly, we sum up our contribution in the conclusion. 2 Preliminries We ssume tht the reder is fmilir with elementry notions of grphs, utomt nd net theory. In this section we fix some nottions nd define some less common notions. Automt nd Lnguges A deterministic finite utomton (DFA) is 5- tuple A = (Q, Σ, δ, q 0, F ) where Q is finite set of sttes, Σ is finite lphbet, q 0 Q is the initil stte, δ : Q Σ Q is the (prtil) trnsition function nd 3

F Q is the set of finl sttes. We denote by δ the function δ : Q Σ Q inductively defined by δ(q, ɛ) = q nd δ(q, w) = δ( δ(q, w), ). The lnguge L(q) of some stte q Q is the set of words w Σ such tht δ(q, w) F. The lnguge recognized by DFA A is defined s L(A) := L(q 0 ). A lnguge is regulr if it is ccepted by some DFA. Myhill-Nerode s theorem nd miniml DFAs Given lnguge L Σ, we sy two words w, w Σ re L-equivlent, denoted by w L w, if wv L w v L for every v Σ. The lnguge L is regulr iff L-equivlence prtitions Σ into finite number of equivlence clsses. Given regulr lnguge L, there exists unique DFA A up to isomorphism with miniml number of sttes such tht L(A) = L; this utomton A is clled the miniml DFA for L. The number of sttes of this utomton recognizing is equl to the number of equivlence clsses. Given DFA A = (Q, Σ, δ, q 0, F ), we sy two sttes q, q Q re A-equivlent if L(q) = L(q ). We cn quotient A with respect to this equivlence reltion. The sttes of the quotient DFA re the equivlence clsses of A. The trnsitions re defined by lifting the trnsitions of A: for every trnsition q q, dd [q] [q ] to the trnsitions of the quotient DFA, where [q] nd [q ] denote s the equivlence clsses of q nd q. The initil stte is [q 0 ], nd the finl sttes re {[q] q F }. The quotient DFA recognizes the sme lnguge s A, nd is isomorphic to the miniml DFA recognizing L. It is esy to see tht the miniml utomton for prefix-closed regulr lnguge hs unique non-finl stte ( trp stte). For simplicity, we sometimes identify this utomton with the one obtined by removing the trp stte together with its ingoing nd outgoing trnsitions. Petri Nets A (mrked) Petri net is 5-tuple N = (P, T, F, W, m 0 ) where P is set of plces, T is set of trnsitions with P T =, F (P T ) (T P ) is flow reltion, W : (P T ) (T P ) N is weight function stisfying W (x, y) > 0 iff (x, y) F, nd m 0 : P N is mpping clled the initil mrking. For ech trnsition or plce x we cll the set x := {y P T : (y, x) F } the preset of x. Anlogously we cll x := {y P T : (x, y) F } the postset of x. A net is pure if no trnsition belongs to both the pre- nd postsets of some plce. Given n rbitrry but fixed numbering of P nd T, the incidence mtrix of N is the P T -mtrix C given by: C(p i, t j ) = W (t j, p i ) W (p i, t j ). A trnsition t T is enbled t mrking m, if p t : m(p) W (p, t). If trnsition t is enbled it cn fire to produce the new mrking m, written s m t m. m (p) := m(p) + C(p, t) p P Given w = t 1 t n T (i.e. t i T ), we write m 0 m if there exist mrkings t m 1,..., m n 1 such tht m 1 t 0 2 t m1 m2... m n n 1 m. Then, we sy w 4

tht m is rechble. The set of rechble mrkings of N is denoted by M(N) nd defined by M(N) = {m : w T w. m 0 m}. It is well-known tht if w m 0 m, then m = m0 + C P (w), where P (w), the Prikh vector of w, is the vector of dimension T hving s i-th component the number of times tht t i occurs in w. We cll this equlity the mrking eqution. A net N is k-bounded if m(p) k for every rechble mrking m nd every plce p of N, nd bounded if it is k-bounded for some k 0. A 1-bounded net is w lso clled sfe. A net is reversible if for every firing sequence m 0 m there is sequence v w leding bck to the initil stte, i.e. m vw m 0. N is live if every trnsition cn fire eventully t every mrking, i.e. m M(N) w m.m wmt m for some m. The rechbility grph of net N = (P, T, F, W, m 0 ) is the directed grph G = (V, E) with V = M(N) nd (x, y) E iff x t y for some t T. If G is finite, then the five-tuple A(N) = (Q, Σ, δ, q 0, F ) given by Q = M(N), Σ = T, q 0 = m 0, F = Q nd δ(m, t) := m if m t m is DFA, nd undefined otherwise. (Note tht δ is well-defined, becuse if m t m nd m t m then m = m.) We cll it the mrking-dfa of N. The lnguge of N, denoted by L(N), is defined s the lnguge of A(N). Workflow nets Loosely speking, workflow net is Petri net with two distinguished input nd output plces without input nd output trnsitions respectively, nd such tht the ddition of reset trnsition leding bck from the output to the input plce mkes the net strongly connected (see Figure 1, for exmple). Formlly, net N = (P, T, F, W, m 0 ) is workflow net if there exist plces i, o P such tht i = = o, m 0 (p) = 1 for p = i nd m 0 (p) = 0, otherwise, nd the net Ñ = (P, T {r}, F {(o, r), (r, i)}, W {(o, r) 1, (r, i) 1}, m 0 ), where r / T, is strongly connected. wr A firing sequence w of workflow net N is run if m 0 m 0 in Ñ. The runs of N re the formliztion of the use cses of the business process modelled by the workflow net. A workflow net N is sound if Ñ is live nd bounded. It is rgued in [vda98] tht well-formed business process cn be modelled by sound workflow net (t certin level of bstrction). The workflow net in Figure 1 is very simple model for processing complints ( slightly ltered exmple, tken from [vdavh04]) The following lemm chrcterizes soundness. In the pper we work with this chrcteriztion s definition. Lemm 1. A workflow net N is sound iff Ñ is bounded, reversible, nd for every trnsition t there is rechble mrking m such tht m enbles t. Proof. Let N = (P, T, F, W, m 0 ) be workflow net. ( ): Assume N is sound. Then Ñ is bounded nd live. We show Ñ is reversible. w Let m be n rbitrry rechble mrking of Ñ. Then m 0 m for some w (T {r}) wr. Since Ñ is live, there is firing sequence w such tht m m for some mrking m. We clim m = m 0. Assume m m 0. Then, since m (i) > 0, 5

contct customer ccept py refund i register collect cquire info rchive o need more info contct deprtment reject send rejection Fig. 1. An exmple for sound workflow net (drwn without the reset trnsition r) we hve m (p) m 0 (p) for every plce p, nd m (p) > m 0 (p) for some p. So m strictly covers m 0, nd so N is not bounded. ( ): Assume Ñ is bounded, reversible nd every trnsition is enbled t some rechble mrking. We show tht Ñ is live, which implies tht N is sound. Let m be n rbitrry rechble mrking of Ñ, nd let t T {r}. Since Ñ is reversible, m w m 0 for some w (T {r}), nd since t occurs in some firing vt sequence m 0 m for some v (T {r}) nd some m. So Ñ is live (nd bounded by ssumption) nd therefore N is sound. Synthesis of Petri nets from Lnguges nd from Automt In [BBD95], Drondeu et l. ddress two synthesis problems of Petri nets from miniml DFA A over n lphbet T : (S1) Decide if there is bounded net N with T s set of trnsitions such tht L(N) = L(A), nd if so return one. We cll this problem synthesis up to lnguge equivlence. (S2) Decide if there is bounded net N with T s set of trnsitions such tht the rechbility grph of N is isomorphic to A, nd if so return one. We cll this problem synthesis up to isomorphism. The lgorithm of [BBD95] for synthesis up to lnguge equivlence works in two phses: firstly, A is trnsformed into n equivlent utomton A in certin norml form. In the worst cse, A cn be exponentilly lrger thn A. The second phse constructs the net N, if it exists, in polynomil time in A. The lgorithm requires exponentil time in A. The lgorithm of [BBD95] for synthesis up to isomorphism, on the contrry, needs only polynomil time in A. Notice tht, in generl, if one knows the lnguge L(N) of net, one does not know yet its rechbility grph. In prticulr, the miniml utomton recognizing L(N) my not be the rechbility grph of ny net. The bsic lgorithm in [BBD95] cn only hndle pure nets, but there is lso generliztion to non-pure nets to be found in [BDBM96]. Hints on how to obtin nets tht re more visully ppeling (i.e. hve few rcs, no redundnt plces, etc.) thn those generted by stndrd synthesis l- 6

gorithms cn be found in [BDKM08], where net synthesis ws pplied to process mining from event logs. 3 A Lerning Algorithm for Sound Workflow Nets Our gol is to develop lerning lgorithm for sound workflow nets which is gurnteed to terminte, nd in which techer only needs to nswer membership queries. The precise lerning setting is s follows. We hve Lener nd Techer. The Lerner is given set T of trnsitions, where ech trnsition corresponds to dedicted tsk (in the sense of [vda98]) of the business process. The Lerner repetedly sks the Techer workflow membership queries. A query is sequence σ T, nd is nswered by the Techer s follows: if σ cn be extended to use cse (i.e., sequence corresponding to complete instnce of the business process), then the Techer returns this use cse in the form of trnsition sequence στr, where τ T. Otherwise, the Techer nswers no. In our running exmple the Lerner is given the set of trnsitions of the net of Figure 1, nd the Techer s nswers re comptible with this net, i.e., cts s if it knew the net. Note tht in prctice, this only mens tht the Techer cn either extend the query to use cse of the net to lern or cn reject the query. Two possible queries re register contct customer contct deprtment register contct customer collect A possible nswer to the first query is the run register contct customer contct deprtment collect ccept py refund rchive while the nswer to the second query is no. Assuming tht the Techer s nswers re comptible with k-bounded nd reversible net N, the gol of the Lerner is to produce net N such tht L(N) = L(N ). It is esy to see tht (very inefficient) lerning lgorithm exists: (1) A net with n trnsitions hs t most c 1 := 2 (n+1) plces, becuse plce is determined by its pre- nd post-sets of trnsitions. (2) By (1), N hs t most c 2 := (k + 1) c1 rechble mrkings. Therefore, there exists miniml DFA A with t most c 2 sttes such tht L(N) = L(A). (3) Since ny two prefix-closed miniml DFAs with c 2 sttes differ in some word of length c 2, the utomton A cn be lerned by querying ll words over T of length 2c 2, i.e., fter t most c 3 := n 2c2 queries. This follows esily from Myhill-Nerode s theorem. The DFA A cn be constructed from the nswers to the queries s follows. The sttes of A re the equivlence clsses of words of L(N) of length up to c 2, where two words w, v re equivlent if for every word u of length up to c 2 either wu nd vu belong to L(N), or none of them does [Vs73, Cho78].) The initil stte is the equivlence clss of the empty word, nd ll sttes re finl. There is trnsition [w] [w] for every word w of length t most c 2. 7

(4) The net N is obtined from A by mens of the lgorithm of [BBD95] for synthesis up to lnguge equivlence (see problem (S1) in Section 2). The lgorithm runs in 2 O(p(c2)) time for some polynomil p. The query complexity of this nive lgorithm, i.e. the number of queries it needs to sk, is triple exponentil in the number n of trnsitions. In this section we prove series of results ending in n improved lgorithm with single exponentil query nd time complexity (notice tht single exponentil time complexity implies single exponentil query complexity, but not vice vers). 3.1 An Upper Bound on the Number of Rechble Mrkings We show tht the nive bound on the number of sttes of A obtined in (2) bove, which is double exponentil in n, cn be improved to single exponentil bound. Given net N = (P, T, F, W, m 0 ) with incidence mtrix C, we denote by C(p) the vector (C(p, t 1 ),..., C(p, t T ). We sy tht plce p is liner combintion of the plces p 1,..., p k if there re rel numbers λ 1,..., λ k such tht C(p) = k i=1 λ i C(p i ). The following lemm is well known. Lemm 2. Let N = (P, T, F, W, m 0 ) be net with incidence mtrix C, nd let C(p) = k i=1 λ ic(p i ). Then for every rechble mrking m: p P. m(p) = m 0 (p) + k i=1 λ i(m(p i ) m 0 (p i )). Proof. Since m is rechble, there is w T w such tht m 0 m. By the mrking eqution m = m 0 + C P (w), nd so in prticulr m(p) = m 0 (p) + C(p) P (w), nd m(p i ) = m 0 (p i ) + C(p i ) P (w) for every 1 i k. So m(p) = m 0 (p) + k i=1 λ ic(p i ) P (w) = m 0 (p) + k i=1 λ i(m(p i ) m 0 (p i )) Theorem 1. Let N = (P, T, F, W, m 0 ) be k-bounded net with n trnsitions. Then N hs t most (k + 1) n rechble mrkings. Proof. The incidence mtrix C hs P rows nd n columns, nd so it hs rnk t most n. So there re l plces p 1,..., p l, l n, such tht C(p 1 ),..., C(p l ) re linerly independent. So every plce p is liner combintion of p 1,..., p l. It follows from Lemm 2 tht for every two rechble mrkings m, m, if m(p i ) = m (p i ) for every 1 i l, then m(p) = m (p) for every plce p. In other words, if two mrkings coincide on ll of p 1,..., p l, they re equl. Since for every rechble mrking m we hve 0 m(p i ) k, the number of projections of the rechble mrkings onto the plces p 1,..., p l is t most (k + 1) l (k + 1) n. So N hs t most (k + 1) n rechble mrkings. 3.2 Minimlity of the mrking-dfa We show tht the mrking-dfa of bounded nd reversible net is miniml. Since our gol is to construct bounded nd reversible net model N of the 8

business process, fter we lern the miniml DFA A with L(A) = L(N) in step (3), we cn cn synthesize N by pplying the lgorithm of [BBD95] for synthesis up to isomorphism (Problem (S2)), insted of the lgorithm for synthesis up to lnguge equivlence (Problem (S1)). This elimintes one exponentil from step (4) of the nive lgorithm. The proof is bsed on Lemm 3 below. Reders fmilir with Myhill-Nerode s theorem (see lso Section 2) will probbly need no proof, but we include one for completeness. Recll tht we identify DFA with single trp stte with the one obtined by removing the trp stte together with its ingoing nd outgoing trnsitions. Lemm 3. A DFA A = (Q, Σ, δ, q 0, F ) is miniml iff the following two conditions hold: (1) every stte lies in pth leding from q 0 to some stte of F, nd (2) L(q) L(q ) for every two distinct sttes q, q Q. Proof. ( ): We prove the contrpositive. For (1), if some stte q does not lie in ny pth from q 0 to some finl stte, then it cn be removed without chnging the lnguge, nd so A is not miniml. For (2), if two distinct sttes q, q of A stisfy L(q) = L(q ), then [q] = [q ], nd so the quotient utomton hs fewer sttes thn A. So A is not miniml. ( ): Assume (1) nd (2) hold. We prove tht for every stte q the lnguge of the words w such tht δ(q 0, w) = q is n equivlence clss of L-equivlence. It follows tht the number of sttes of A is t most s lrge s the number of equivlence clsses of L-equivlence, which implies tht A is the miniml DFA for L. It suffices to show: If δ(q 0, w) = q = δ(q 0, v), then w L v. This follows immeditely from the definition of L-equivlence. If δ(q 0, w) = q nd δ(q 0, v) = q for some q q, then w L v. Since L(q) L(q ), w.l.o.g. there is word u L(q) \ L(q ). So wu L nd vu / L, which implies w L v. Theorem 2. Let N = (P, T, F, W, m 0 ) be bounded nd reversible Petri net. The mrking-dfa A(N) of N is miniml DFA. Proof. Assume tht A(N) is not miniml. Since every stte of A(N) is finl, by Lemm 3 there re two sttes of A(N), i.e., two rechble mrkings m 1 m 2 of N, such tht L(m 1 ) = L(m 2 ). As m 1 m 2 there exists p P with m 1 (p) m 2 (p). Assume w.l.o.g. m 1 (p) < m 2 (p). Let m be rechble mrking such tht m(p) is miniml, i.e. there is no other rechble mrking m s.t. m (p) < m(p). Since m is rechble nd N is reversible, there is w T such w tht m 2 m. Since L(m1 ) = L(m 2 ), there is rechble mrking m such w tht m 1 m. It follows m (p) = m 1 (p) + C(p) P (w) < m 2 (p) + C(p) P (w) = m(p) contrdicting the minimlity of m(p). 9

3.3 Lerning the rechbility grph by Explortion The finl step towrds single exponentil lerning lgorithm consists of improving the nive lgorithm of step (3) for lerning the miniml DFA A. Recll tht we ssume tht the Techer s nswers re comptible with k-bounded nd reversible net N. If n nd r re the number of trnsitions nd rechble mrkings of N, then the nive lgorithm requires n r membership queries. We present new lgorithm tht requires only O(n r 2 ) queries. Recll the stndrd serch pproch for constructing the rechbility grph of net if the net is known. We mintin queue of mrkings, initilly contining the initil mrking, nd two sets of lredy visited mrkings nd trnsitions (trnsitions between mrkings). While the queue is non-empty, we tke the mrking m t the top of the queue, nd check for ech trnsition whether is enbled t m. If so, we compute the mrking m such tht m m, nd proceed s follows: if m hs been lredy visited, we dd m m to the set of visited trnsitions; if m hd not been visited yet, we dd m to the set of visited mrkings nd to the queue, nd dd m m to the set of visited trnsitions. Our lerning lgorithm closely mimics this behviour, but works with firing sequences of N insted of rechble mrkings (the Lerner does not know the mrkings of the net, it does not even know its plces). We mintin queue of firing sequences, initilly contining the empty sequence, nd two sets of lredy visited firing sequences nd trnsitions. While the queue is non-empty, we tke the firing sequence w (T {r}) t the top of the queue, nd sk the Techer for ech trnsition whether w is lso firing sequence of N. If so, we proceed s follows. We first determine whether ech lredy visited firing sequence u leds to the sme mrking s w. Notice tht it is not obvious how to do this this is the key of the lerning lgorithm. If some firing sequence u leds to the sme mrking s w, then we dd w u to the set of visited trnsitions; otherwise, we dd w to the set of visited firing sequences nd to the queue, nd dd w w to the set of visited trnsitions. The lgorithm in pseudo code cn be found below (Algorithm 1), where Equiv(u, v) denotes tht there is u v mrking m such tht m 0 m nd m 0 m. The correctness of the lgorithm is immedite: we just simulte serch lgorithm for the construction of the rechbility grph, using firing sequence u u to represent the mrking m such tht m 0 m. The check Equiv(u, w) gurntees tht ech mrking gets exctly one representtive. The problem is to implement Equiv(u, w) using only membership queries. In generl this is no esy tsk, but in the cse of reversible nets it cn be esily done s follows. When checking Equiv(u, w) the word u hs been lredy dded to V, nd so the Lerner hs estblished tht u L(N). So in prticulr the Techer hs nswered positively query bout u nd, due to the structure of workflow membership queries, it hs returned run uu c, where u c r is trnsition sequence leding bck to the initil mrking. We prove tht Equiv(x, y) holds if nd only if the sequence xy c is run of N: 10

Algorithm 1: Lerning the rechbility grph Output: grph (V, E) isomorphic to the rechbility grph of N V ; E F {ɛ} // queue of firing sequences while not F.empty() do w F.dequeue() forll T do if w is ccepted by the Techer then /* This mens w L(N) */ σ w forll u V do if Equiv(u, w) then σ u end if σ = w then F.enqueue(w) dd σ to V nd w σ to E end end end Proposition 1. In Algorithm 1, Equiv(u, w) = true if nd only if uw c is run of N, where ww c is the run reported by the Techer when positively nswering the query bout w L(N). Proof. If Equiv(u, w) = true, then there is mrking m such tht m 0 m w w nd m 0 m. Becuse m 0 m wcr u m 0, we hve m 0 m wcr m 0, which implies tht uw c is run. ww If u w c is run, then we hve m cr uw 0 m 0 nd m cr 0 m 0. Let m be w the mrking such tht m 0 m. We then hve m wcr m 0. Moreover, m is the only mrking such tht m wcr m 0 (Petri nets re bckwrd deterministic: given firing sequence nd its trget mrking, the source mrking is uniquely uw determined). Since m cr u 0 m 0, we then necessrily hve m 0 m wcr m 0, u nd so in prticulr m 0 m. So both w nd u led to the sme mrking m, nd we hve Equiv(u, w) = true. We cn now esily show tht checking Equiv(u, w) reduces to one single membership query. Proposition 2. The check Equiv(u, w) cn be performed by querying whether uw c L(N): Equiv(u, w) holds if nd only if the Techer nswers positively nd returns the sequence uw c itself s run. Proof. There re three possible cses: The nswer is negtive. Then uw c / L(N), nd so in prticulr it is not run of N. So Equiv(u, w) = flse. u 11

The nswer is positive nd the Techer returns uw c s run. Then Equiv(u, w) = true by Proposition 1. The nswer is positive, but the Techer returns uw c v for some v ɛ s run. Since the Techer returns run uw c v such tht no proper prefix uw c v is run, we hve in prticulr by tking v = ɛ tht uw c is not run. By Proposition 1 we hve Equiv(u, w) = flse. Remrk 1. In nticiption to the experiments described in Section 4, let us mention tht in mny cses the queries for uw c do not even hve to be submitted to the techer. (recll tht w lbels the potentilly new stte nd u lbels known stte). Often we cn deduce tht uw c is not fireble by observing tht uw c / L(A) where A is the prt of the DFA tht is lredy known. If we would query wu c insted (which would lso tell us if Equiv(u, w) = true) we would not be ble to discrd ny query becuse the neighbourhood of w hs not yet been explored. This is one of the resons why this lgorithm is so efficient in prctice (cf. Section 4). Exmple 1. We now provide n exmple run of our lgorithm, pplied to the first prt of the net in figure 1. To simplify presenttion we grouped together some queries which correspond to the interesting stges of the lgorithm (w A re ll queries w with A). i 0 1 2 3 # Query Answer Possible Automt 1-4 ε {0, 1, 2, 3} 0(123) 0 0 o 5 Equiv(ε, 0)? ε 123 no 0 3 Possibilities: 6-8 0 {0, 1, 3} 01(23) 0 1 0 1 0 1 9 Equiv(ε, 01)? no 0 ε 23 1 0 1 10 Equiv(0, 01)? 0 23 no 0 1 11 02 02(13) (4 Possiblities) 12 Equiv(ε, 02)? ε 13 no (3 Poss.) 13 Equiv(0, 02)? 0 13 no (2 Poss.) 14 Equiv(01, 02)? 0 1 no 01 13 2 15-18 01 {0, 1, 2, 3} 012(3) (5 Poss.) Equiv? 0 1 2 19-22 no ({ε, 0, 01, 02}, 012) 2 23-26 02 {0, 1, 2, 3} 021(3) (6 Poss. - nive) 27 [Equiv(021,012)?] yes 28-31 012 {0, 1, 2, 3} 0123(ε) (7 Poss. - nive) 32 [Equiv-Queries] no The Answer -column contins the run ww c returned by the techer, if w L(N) - we put the continutions w c in brckets. As observed in Remrk??, mny queries (like 23 in # 9) do not relly hve to be sked - either becuse we lredy sked prefix of the query tht ws rejected, or becuse the query is prefix of run supplied by the techer nd therefore we lredy know tht is is ccepted. We lso do not need to sk query # 27 becuse 021 nd 012 hve the sme Prikh vector nd therefore must led to the sme mrking. There is technicl issue tht should be mentioned t this point. The lgorithm delivers net N such tht the rechbility grphs of N nd N re isomorphic. It follows tht N is reversible nd bounded. However, we cnnot 0 0 1 2 1 2 2 1 2 1 3 12

gurntee tht N hs the sme bound s N. We consider this minor problem, since N nd N re for behviourl purposes equivlent models. Complexity It follows clerly from the description of Algorithm 1 tht the number of firing sequences dded to the queue is equl to the number of rechble mrkings r of N. For the i-th sequence tken from the queue, sy w, nd for ech trnsition, sy, we perform t most i membership queries: one to check if w L(N), nd t most (i 1) for checks Equiv(u, w), becuse t tht point V contins t most i 1 elements. So the lgorithm performs t most r i=1 n i = nr(r + 1)/2 O(n r2 ) queries. The following theorem sums up the results of the section. Theorem 3 (Lerning by Explortion). We cn lern k-bounded nd reversible net N with number of workflow membership queries nd running time tht re single exponentil in the number of trnsitions of N. The proof follows esily form our discussion. The overll lgorithm, tht we cll WNL, uses the lerning technique of Section 3.3 to lern miniml DFA A such tht L(A) = L(N). Section 3.1 shows tht A is single exponentil in the number of trnsitions of N, nd so it cn be lerned with single exponentil number of queries. Section 3.2 shows tht this miniml DFA is (isomorphic to) the rechbility grph of N. We cn then pply the polynomil lgorithm of [BBD95] for synthesis up to isomorphism (S2). A finl question is wht hppens if the Techer s nswers re not comptible with ny k-bounded nd reversible net N. In this cse there re two possibilities: they re not comptible with ny miniml DFA hving t most (k + 1) n sttes, or they re comptible with some such DFA, but this DFA is not the mrking- DFA of ny net. In the first cse the lgorithm cn stop when the number of generted sttes exceeds (k + 1) n. In the second cse, the lgorithm termintes nd produces DFA, but the synthesis lgorithm of [BBD95] does not return net. 3.4 Mixing process mining nd lerning The lgorithm we hve just presented does not ssume the existence of n event log: the Lerner only gets informtion from membership queries. However, s explined in the introduction, we consider our lerning pproch s wy of complementing log-bsed process mining. In this section we explin how to modify the lgorithm ccordingly. We ssume the existence of n event log consisting of use cses. Given the set of tsks T of the business process, we cn think of ech use cse s word w T, such tht w corresponds to run of the reversible net to be lernt. The event log then corresponds to lnguge L T. In first step we construct the miniml DFA for the lnguge L. This cn be done spce-efficiently in number of wys. For instnce, we cn divide the set of runs in two hlves L 1, L 2, recursively compute miniml DFAs A 1, A 2 recognizing 13

L 1 nd L 2, nd then compute the miniml DFA for L from A 1, A 2 using n lgorithm very similr to the one tht computes the union of two binry decision digrms [And99]. Once this is done, we esily get the miniml DFA A for the lnguge of prefixes of (Lr) (this requires to dd one extr stte nd mke ll sttes finl). Once A is computed, we ssign to ech stte q of A word w q such tht q 0 w q q. For every two sttes q1, q 2, we check whether the sttes correspond to the sme rechble mrking by clling Equiv(w q1, w q2 ). After this step we re in the sme sitution we would hve reched if the lgorithm would hve queried ll the words w q. From prcticl point of view, notice tht it is very inefficient to sk the Techer for ech pir of sttes q 1, q 2 whether Equiv(w q1, w q2 ). A better procedure is to sk the Techer, given sequence w, which re the letters such tht w cn be extended to use cse. We cll them the possible extensions of w. The test Equiv(w q1, w q2 ) need only be crried out for sequences w q1, w q2 hving the sme set of extensions. Note tht the techer does not hve to provide full runs for ny of these possible extensions so this is quite simple tsk. We cn even more reduce the number of clls to Equiv() by first merging sttes for which we cn lredy deduce tht they hve to be equivlent. Some criteri, which re esy properties of Petri nets, nd cn be directly used to trim DFA tht ws generted from event logs re: The DFA is bckwrd deterministic: if m 1 m 3 nd m 2 m 3 for some T then m 1 = m 2 If two words w 1,w 2 only differ in the order of their letters (i.e. their Prikh vectors coincide P (w 1 ) = P (w 2 )) then they led to the sme stte Given k-bounded net N, if vw k+1 L(N) for some words v, w then w describes cycle in the rechbility grph of N A further criterion for pure nets is the dimond property : We cn dd trnsitions tht hve to be present due to bsic Petri net properties. A dimond is subgrph in the rechbility grph of net with four sttes tht re connected b b m 4. A in the following wy: m 1 m 2, m 1 m 3, m 2 m 4, m 3 dimond is incomplete if it is missing exctly one trnsition (see Figure 2). One cn esily see tht incomplete dimonds cn lwys be completed with the missing trnsition (in the cse of pure nets), i.e., if n incomplete dimond is found in the DFA, we cn dd the missing trnsition. This dimond property cn lso be used to merge sttes s indicted in Figure 2. 3.5 A Lower Bound for Petri Net Lerning We now show tht we cnnot in generl solve the lerning problem in subexponentil time, by providing hrd-to-lern instnce. We will show with the help of n dversry rgument tht ny lerning lgorithm hs to sk t lest Ω(2 n ) membership queries to derive the correct net, where n is the number of trnsitions. 14

m 2 b b b m 1 m 4 merge() b m 3 b b merge() Fig. 2. Incomplete dimond (left), sttes merged becuse of equl prikh-vectors (middle) or by using the dimond property (middle nd right) Consider the following set N of workflow nets. All the nets in N hve the sme number n + 3 of trnsitions: two trnsitions init nd finl, trnsitions clled 1,..., n, nd trnsition t (see Figure 3). The pre- nd postsets of ll trnsitions but t, which re identicl for ll nets of N, re shown in the figure. The postset of t is lwys the plce o. The preset of t lwys contins for ech i exctly one of the plces p i or q i, nd the only difference between two nets in N is their choice of p i or q i. Clerly, the set N contins 2 n workflow nets, ll of them sound. For ech net N N there is exctly one subset of { 1,..., n } such tht t cn fire fter the trnsitions of the set hve fired. We cll this subset S N. Notice tht if we know S N then we cn infer t. p 1 p 2 p n 1 2 n i init q 1 q 2 t finl Fig. 3. Hrd-to-lern instnce for Petri net lerning o q n We ese the tsk for the Lerner by ssuming she knows tht the net to be lerned belongs to N. Her tsk consists only of finding out t, or, equivlently, the set S N. A query of n optiml Lerner hs lwys the form i1 i2 ik t, becuse querying ny i fter t does not provide the Lerner with ny informtion. Furthermore the order of the i is not importnt ll these trnsitions re independent nd the Lerner lredy knows this. So we cn view query just s subset S of the set of ll trnsitions. A negtive nswer to query S lwys rules out exctly one of the nets of N, nmely the one in which t = S. The worst cse ppers when the Lerner sk queries in the worst possible order, eliminting ll nets of N but the right one. This requires 2 n 1 queries. 15

4 Prcticl experiences To get insights in the prcticl fesibility of the derived lgorithm WNL, we hve developed prototype lerning nd synthesis tool for workflow nets nd exmined its prcticl performnce on number of exmples. Implementtion Our prototype is written in C++ with pproximtely 3,000 lines of code nd uses libalf for deling with utomt. libalf is prt of the utomt lerning fctory currently developed jointly t RWTH Achen nd TU München 1. The synthesis lgorithm (S2) of [BBD95] is implemented using the lp solve 2 frmework to efficiently solve the liner progrms needed for computing the plces of the net. Furthermore lp solve is used for eliminting redundnt plces fter the net hs been synthesized to reduce its size nd to mke it look more ppeling. The implementtion is currently not tilored to user interction but consults pre-existing workflow nets for queries. Outputs re given in form of dot-files tht cn be visulized using the grphviz toolkit. Experimentl Results We tested our implementtion on vrious exmples of pure, sfe nd reversible nets. The exmples rnge from existing sound workflow nets obtined in cse studies performed by [Ver04] to more stndrd exmples like mutul exclusion between processes nd n n-cell buffer with 2 n rechble mrkings. The ltter exmple is especilly suitble to understnd sclbility issues of the lgorithms. The bsence workflow is loosely modelled fter n exmple from [SAP01], the complint workflow is the exmple presented in our bckground section (Figure 1). We pplied our implementtion once without ny event logs s initil knowledge nd then gin with rndomly generted logs s input nd counted the number of queries needed to lern the model. Besides counting the queries needed for Equiv(), we only count queries nswered positively by the Techer, s these correspond to runs supplied by him, nd thus reflect the ctul work to be done by n expert in n dequte mnner. q? b Fig. 4. Querying extensions t stte q, possible extensions: solid rrows c?? 1 http://liblf.informtik.rwth-chen.de/ 2 http://lpsolve.sourceforge.net/ To illustrte this, consider the tsk of lerning the sequence of clendr months: insted of sking twelve questions of the form Does Jnury, Februry,... come fter July? (we cll these smll-step queries) we would just sk Which month comes fter July?. So we count every continution provided by the techer s one query. In the sitution of Figure 4 we would count 2 workflow membership queries compred to 3 smll-step queries. We hve lso included the number of smll-step queries in the tble below for comprison. 16

Model T RG ssq WNL buf_2 3 4 19 12 buf_3 4 8 52 32 buf_4 5 16 137 85 buf_5 6 32 344 216 buf_6 7 64 842 538 buf_7 8 128 2008 1304 buf_8 9 256 4707 3107 mutex_2 6 8 74 40 mutex_3 9 20 300 168 mutex_4 12 48 1026 594 order_simp 9 7 77 23 bsence 11 8 109 32 complint 12 11 155 37 trnsit1 25 77 2256 474 Fig. 5. Membership queries needed by WNL without ny event logs; ssq = number of smll-step queries, RG = rechbility grph We hve first collected the number of membership queries needed by WNL when lerning model from scrtch with respect to the size of the lphbet nd the rechbility grph, see Figure 5. On the chosen exmples, the number of membership queries rnges between 12 nd 3100. The series of the n-cell buffer exmples from n = 2 to n = 8 suggests tht the prcticl performnce of WNL is even better thn qudrtic in the number of rechble mrkings. Next, we studied the effect of lerning workflow nets in the presence of existing logs. To this end, we used our tool to generte rndom event logs contining vrying number of runs (see Figure 6 for n exmple log). The runs in the generted log-files re not unique runs tht re more likely will probbly pper multiple times, which is lso the cse for rel-world event logs. For the rndom logs we clculted the verge number of queries over 100 executions. 1..b..c.d.b.c..d.b.c.d...b.c.d...b.c..b.d.c..d.b.c.d...b..c.d.b.c.d...b..c.b..d.c.d.b.c.d...b.c..d.b..c.d.b.c.d...b.c..b.d.c.d...b..c.b.d..c.d.b.c.d...b.c.d...b.c.d. 1 b c d 1 Fig. 6. Exmple event log for 3-cell buffer We found out tht for tiny models like the buffer with two cells or the complint workflow very smll number of runs (< 10) suffices to lredy construct the model. The Techer does not hve to supply dditionl runs for these. Clerly, for lrger models, we cn only expect tht the Techer s work is reduced but not completely eliminted when logs re given. To illustrte the impct of event logs on the lerning process we show how 17

600 500 buffer_5 mutex_3 mutex_4 trnsit Averge num. of queries 400 300 200 100 0 0 10 20 30 40 50 60 70 80 90 100 Number of runs in log Fig. 7. Averge number of queries needed by WNL pplied to event logs of different sizes the number of queries behves for some of the lrger models with logs of different sizes (see Figure 7). 3 We observe tht lredy quite smll logs drsticlly reduce the number of queries to be nswered. At the sme time, becuse our logs my not contin unique but mny identicl entries, lrger logs contribute less nd less new knowledge. This reflects the sitution for rel-life logs, which mostly contin common executions of workflow but lck less common runs. In other words, it seems most promising for prcticl pplictions, to combine knowledge from (smll) logs with tht of Techers responsible for corner cses to ctully lern the workflow net in question efficiently. The time needed for lerning the nets in n pplied setting is of course dominted by the number of queries user hs to nswer. Synthesizing the resulting Petri net using the method proposed by Drondeu et l. (see Section 2) together with some post-processing to remove redundnt plces needs just few seconds in the worst cse nd is therefore negligible. The results depicted in Figures 5 nd 7 suggest tht, despite the seemingly intimidting result in Section 3.5, lerning of workflow models is quite fesible for prcticl pplictions. 3 Also lrger exmples behve in the sme wy, yet, we depicted models requiring number of queries in the sme order of mgnitude to optimize the figure. 18

5 Conclusion We hve presented new pproch for mining workflow nets bsed on lerning techniques. The pproch pllites the problem of incompleteness of event logs: if log is incomplete, our lgorithm derives membership queries identifying the missing knowledge. The queries cn be pssed to n expert, whose nswers llow to produce model. We hve shown the correctness nd completeness of our pproch under the ssumption of techer nswering workflow membership queries. Strting with generl combintoril rguments showing tht workflow models cn in principle be lerned, we hve derived lerning lgorithm requiring single exponentil number of queries in the worst cse, nd we hve given mtching lower bound. We hve lso shown experimentl evidence indicting tht the combintion of n event log, even of smll size, nd Techer responsible for providing informtion bout corner cses llows to efficiently produce models in prcticlly relevnt cses. There re severl promising pths for further reserch. One spect is the ppliction of lerning to the design of workflows. In this pproch n expert on business processes nd modelling expert (or n dequte softwre) cooperte. The modelling expert sks queries bout how the workflow should behve, which re nswered by the Techer, until model ccepted by the business process expert is produced. We expect to trnsfer ides from the field of lerning models of softwre systems [BKKL09] to workflow systems, nd develop teching ssistnts tht filter the queries, utomticlly nswering those for which the nswer cn be deduced from current informtion (for instnce becuse it is known tht two tsks must be concurrent), nd only pssing to the expert the remining ones. Here we expect to profit from relted work by Desel, Lorenz nd others [BDML09]. An importnt point for process mining nd even more for process design is designing fult tolernce techniques llowing to cope with flse nswers by the Techer. Finlly, lerning more generl clsses of Petri nets, nd pplictions to modelling/reconstruction of distributed systems, or biologicl/chemicl processes, re lso promising pths for future work. References [AGL98] [And99] [Ang87] [BBD95] Rkesh Agrwl, Dimitrios Gunopulos, nd Frnk Leymnn. Mining process models from workflow logs. In EDBT, volume 1377 of LNCS, pges 469 483. Springer, 1998. Henrik Reif Andersen. An introduction to binry decision digrms. Technicl report, 1999, http://www.itu.dk/people/hr/bdd-ep.pdf Dn Angluin. Lerning regulr sets from queries nd counterexmples. Informtion nd Computtion, 75(2):87 106, 1987. Eric Bdouel, Luc Bernrdinello, nd Philippe Drondeu. Polynomil lgorithms for the synthesis of bounded nets. In TAPSOFT 95: Proceedings of the 6th Interntionl Joint Conference CAAP/FASE on Theory nd Prctice of Softwre Development, pges 364 378, London, UK, 1995. Springer-Verlg. 19

[BDBM96] Eric Bdouel nd Philippe Drondeu. On the synthesis of generl petri nets. Technicl report, INRIA, 1996. [BDKM08] Robin Bergenthum, Jörg Desel, Christin Kölbl, nd Sebstin Muser. Experimentl results on process mining bsed on regions of lnguges. In CHINA 2008, workshop t the Applictions nd theory of Petri nets : 29th interntionl conference, 2008. [BDLM07] Robin Bergenthum, Jörg Desel, Robert Lorenz, nd Sebstin Muser. Process mining bsed on regions of lnguges. In Gustvo Alonso, Peter Ddm, nd Michel Rosemnn, editors, BPM, volume 4714 of LNCS, pges 375 383. Springer, 2007. [BDML09] Robin Bergenthum, Jörg Desel, Sebstin Muser, nd Robert Lorenz. Construction of process models from exmple runs. T. Petri Nets nd Other Models of Concurrency, 2:243 259, 2009. [BKKL09] Benedikt Bollig, Joost-Pieter Ktoen, Crsten Kern, nd Mrtin Leucker. Lerning communicting utomt from MSCs. IEEE Trnsctions on Softwre Engineering (TSE), 2009. in press. [Cho78] Tsun S. Chow. Testing softwre design modeled by finite-stte mchines. TSE, 4(3):178 187, My 1978. Specil collection bsed on COMPSAC. [KRS06] Ekkrt Kindler, Vldimir Rubin, nd Wilhelm Schäfer. Process mining nd petri net synthesis. In Johnn Eder nd Schhrm Dustdr, editors, Business Process Mngement Workshops, volume 4103 of LNCS, pges 105 116. Springer, 2006. [RGvdA + 07] Vldimir Rubin, Christin W. Günther, Wil M. P. vn der Alst, Ekkrt Kindler, Boudewijn F. vn Dongen, nd Wilhelm Schäfer. Process mining frmework for softwre processes. In Qing Wng, Dietmr Pfhl, nd Dvid M. Rffo, editors, ICSP, volume 4470 of LNCS, pges 169 181. Springer, 2007. [SAP01] SAP AG. SAP Business Workflow Demo Exmples (BC-BMT-WFM), 2001. [Vs73] M. P. Vsilevski. Filure dignosis of utomt. Cybernetic, 9(4):653 [vda98] 665, 1973. Wil M. P. vn der Alst. The ppliction of petri nets to workflow mngement. Journl of Circuits, Systems, nd Computers, 8(1):21 66, 1998. [vdavdg + 07] Wil M. P. vn der Alst, Boudewijn F. vn Dongen, Christin W. Günther, R. S. Mns, An Krl Alves de Medeiros, Anne Rozint, Vldimir Rubin, Minseok Song, H. M. W. (Eric) Verbeek, nd A. J. M. M. Weijters. Prom 4.0: Comprehensive support for el process nlysis. In Jetty Kleijn nd Alexndre Ykovlev, editors, ICATPN, volume 4546 of LNCS, pges 484 494. Springer, 2007. [vdavdh + 03] [vdavh04] [Ver04] Wil M. P. vn der Alst, Boudewijn F. vn Dongen, Jochim Herbst, Lur Mruster, Guido Schimm, nd A. J. M. M. Weijters. Workflow mining: A survey of issues nd pproches. Dt Knowl. Eng., 47(2):237 267, 2003. Wil vn der Alst nd Kees vn Hee. Workflow Mngement. Models, Methods, nd Systems. MIT Press, 2004. Henricus M.W. Verbeek. Verifiction of WF-nets. PhD thesis, Technische Universiteit Eindhoven, 2004. 20