Chapter 4 Getting the Data

Similar documents
Chapter 12 Analyzing Spaghetti Processes

Process Mining Data Science in Action

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Using Process Mining to Bridge the Gap between BI and BPM

Data Science. Research Theme: Process Mining

Process Mining: Making Knowledge Discovery Process Centric

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining. Data Analysis and Knowledge Discovery

Process Mining The influence of big data (and the internet of things) on the supply chain

XES. Standard Definition. Where innovation starts. Den Dolech 2, 5612 AZ Eindhoven P.O. Box 513, 5600 MB Eindhoven The Netherlands

Eventifier: Extracting Process Execution Logs from Operational Databases

Relational XES: Data Management for Process Mining

EDIminer: A Toolset for Process Mining from EDI Messages

Summary and Outlook. Business Process Intelligence Course Lecture 8. prof.dr.ir. Wil van der Aalst.

Model Discovery from Motor Claim Process Using Process Mining Technique

DRAFT. Standard Definition. Extensible Event Stream. Christian W. Günther Fluxicon Process Laboratories

Process Mining and Visual Analytics: Breathing Life into Business Process Models

Business Process Measurement in small enterprises after the installation of an ERP software.

Business Intelligence and Process Modelling

BUsiness process mining, or process mining in a short

Process Mining. Data science in action

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010

Process Mining Tools: A Comparative Analysis

Process Modelling from Insurance Event Log

Dotted Chart and Control-Flow Analysis for a Loan Application Process

Developer Guide. Christian W. Günther Library version: 1.0 RC7 Revision: 7

The ProM framework: A new era in process mining tool support

Feature. Applications of Business Process Analytics and Mining for Internal Control. World

Process Mining Manifesto

Process Mining and the ProM Framework: An Exploratory Survey - Extended report

Title: Basic Concepts and Technologies for Business Process Management

Analysis of Service Level Agreements using Process Mining techniques

BPIC 2014: Insights from the Analysis of Rabobank Service Desk Processes

Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities

Process Mining and Fraud Detection

Supporting the BPM lifecycle with FileNet

Using Trace Clustering for Configurable Process Discovery Explained by Event Log Data

Modeling and Analysis of Incoming Raw Materials Business Process: A Process Mining Approach

SecSy: A Security-oriented Tool for Synthesizing Process Event Logs

Implementing Heuristic Miner for Different Types of Event Logs

Merging Event Logs with Many to Many Relationships

Analyzing a TCP/IP-Protocol with Process Mining Techniques

Business Process Modeling

Wanna Improve Process Mining Results?

Conformance Checking of Interacting Processes With Overlapping Instances

Process Mining and Monitoring Processes and Services: Workshop Report

Using Event Logs to Derive Role Engineering Artifacts

Trace Clustering in Process Mining

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective

Generation of a Set of Event Logs with Noise

Life-Cycle Support for Staff Assignment Rules in Process-Aware Information Systems

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

ProM 6 Tutorial. H.M.W. (Eric) Verbeek mailto:h.m.w.verbeek@tue.nl R. P. Jagadeesh Chandra Bose mailto:j.c.b.rantham.prabhakara@tue.

Translating Message Sequence Charts to other Process Languages using Process Mining

Dept. of IT in Vel Tech Dr. RR & Dr. SR Technical University, INDIA. Dept. of Computer Science, Rashtriya Sanskrit Vidyapeetha, INDIA

Combination of Process Mining and Simulation Techniques for Business Process Redesign: A Methodological Approach

An Introduction to Business Process Modeling

3TU.BSR: Big Software on the Run

CHAPTER 1 INTRODUCTION

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

HONE: Correlating Host activities to Network communications to produce insight

Process Analysis and Organizational Mining in Production Automation Systems Engineering

Audits. Alerts. Procedure

Configuring IBM WebSphere Monitor for Process Mining

Process Mining in Big Data Scenario

Case Based Model to enhance aircraft fleet management and equipment performance

An Ontology-based Framework for Enriching Event-log Data

Big Data. Fast Forward. Putting data to productive use

Process Mining in Healthcare: Data Challenges when Answering Frequently Posed Questions

Predicting Deadline Transgressions Using Event Logs

Using Semantic Lifting for improving Process Mining: a Data Loss Prevention System case study

ProM Framework Tutorial

Decision Mining in Business Processes

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Extracting Event Data from Databases to Unleash Process Mining

Investigating Clinical Care Pathways Correlated with Outcomes

Profiling Event Logs to Configure Risk Indicators for Process Delays

Facilitating Business Process Discovery using Analysis

Process simulation. Enn Õunapuu

SMART DIAGNOSTICS AFTER-SALES SERVICE TODAY

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version Fix Pack 2.

Towards Cross-Organizational Process Mining in Collections of Process Models and their Executions

Transcription:

Chapter 4 Getting the Data prof.dr.ir. Wil van der Aalst www.processmining.org

Overview Chapter 1 Introduction Part I: Preliminaries Chapter 2 Process Modeling and Analysis Chapter 3 Data Mining Part II: From Event Logs to Process Models Chapter 4 Getting the Data Chapter 5 Process Discovery: An Introduction Chapter 6 Advanced Process Discovery Techniques Part III: Beyond Process Discovery Chapter 7 Conformance Checking Chapter 8 Mining Additional Perspectives Chapter 9 Operational Support Part IV: Putting Process Mining to Work Chapter 10 Tool Support Chapter 11 Analyzing Lasagna Processes Chapter 12 Analyzing Spaghetti Processes Part V: Reflection Chapter 13 Cartography and Navigation Chapter 14 Epilogue PAGE 1

Goal of process mining What really happened in the past? Why did it happen? What is likely to happen in the future? When and why do organizations and people deviate? How to control a process better? How to redesign a process to improve its performance? PAGE 2

Getting the data world business processes people machines components organizations models analyzes supports/ controls specifies configures implements analyzes software system records events, e.g., messages, transactions, etc. (process) model discovery conformance enhancement event logs PAGE 3

From heterogeneous data sources to process mining results Extract, Transform, and Load (ELT) optional data source ELT data source data source ELT ELT data warehouse coarse-grained scoping data source data source extract XES, MXML, or similar unfiltered event logs process mining discovery conformance enhancement filter filtered event logs (process) models answers fine-grained scoping PAGE 4

Example log A process consists of cases. A case consists of events such that each event relates to precisely one case. Events within a case are ordered. Events can have attributes. Examples of typical attribute names are activity, time, costs, and resource. PAGE 5

Another view process cases events 1 35654423 35654424... 35654427 attributes activity= register request time = 30-12-2010:11.02 resource = Pete costs = 50... activity= reject request time = 07-01-2011:14.24 resource = Pete costs = 200 2 35654483 35654485... 35654489 activity= register request time = 30-12-2010:11.32 resource = Mike costs = 50... activity= pay compensation time = 08-01-2011:12.05 resource = Ellen costs = 200 3 35654521 activity= register request time = 30-12-2010:11.32 resource = Pete costs = 50 35654522...... 35654533 activity= pay compensation time = 15-01-2011:10.45 resource = Ellen costs = 200......... PAGE 6

Standard transactional life-cycle model schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 7

Five activity instances a: schedule assign start complete d: start complete b: c: schedule assign reassign start suspend resume complete start suspend resume suspend abort_activity e: complete schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 8

Overlapping activity instances a: start start complete complete start start complete complete a: 6 hours 2 hours 5 hours 9 hours schedule assign reassign suspend start resume autoskip manualskip complete withdraw abort_activity abort_case successful termination unsuccessful termination PAGE 9

Using attributes social network showing how work flows from one person to another Pete Sara Sue Mike performance indicators per activity Ellen Sean Activity b Frequency: 456 Waiting time: 15.6 +/- 2.5 hours Service time: 1.2 +/- 0.5 hours Costs: 412 +/- 55 euros start A a register request control flow c1 c2 E b examine thoroughly A c examine casually A d check ticket c3 c4 role M e decide f M c5 reinitiate request A g pay compensation A h reject request Activity g Frequency: 311 Waiting time: 12.4 +/- 2.1 hours Service time: 0.5 +/- 0.2 hours Costs: 198 +/- 35 euros end Activity h Frequency: 407 Waiting time: 7.4 +/- 1.8 hours Service time: 1.1 +/- 0.3 hours Costs: 209 +/- 38 euros PAGE 10

XES (extensible Event Stream) See www.xes-standard.org. Adopted by the IEEE Task Force on Process Mining. Predecessor: MXML and SA-MXML. The format is supported by tools such as ProM (as of version 6), Nitro, XESame, and OpenXES. ProMimport supports MXML. PAGE 11

PAGE 12

XES (compatible with MXML) Event log consists of: traces (process instances) events Standard extensions: concept (for naming) lifecycle (for transactional properties) org (for the organizational perspective) time (for timestamps) semantic (for ontology references) PAGE 13

extensions loaded every trace has a name every event has a name and a transition start of trace (i.e. process instance) classifier = name + transition name of trace resource timestamp transition name of event (activity name) PAGE 14

end of trace (i.e. process instance) start of trace name of trace resource timestamp name of event (activity name) data associated to event PAGE 15

Challenges when extracting event logs Correlation: Events in an event log are grouped per case. This simple requirement can be quite challenging as it requires event correlation, i.e., events need to be related to each other. Timestamps: Events need to be ordered per case. Typical problems: only dates, different clocks, delayed logging. Snapshots. Cases may have a lifetime extending beyond the recorded period, e.g., a case was started before the beginning of the event log. Scoping. How to decide which tables to incorporate? Granularity: the events in the event log are at a different level of granularity than the activities relevant for end users. PAGE 16

Flattening reality into event logs Orderline Order OrderID : OrderID 1 1..* OrderLineID : OrderLineID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt DelID : DelID 0..* 1 Delivery DelID : DelID When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo PAGE 17

Tables Orderline Order OrderID : OrderID 1 1..* OrderLineID : OrderLineID OrderID : OrderID Customer : CustID Product : ProdID Amount : Euro NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* 0..1 Attempt DelID : DelID When : DateTime Successful : Bool 0..* 1 Delivery DelID : DelID DelAddress : Address Contact : PhoneNo PAGE 18

Order:91245 Order instance Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 OrderLine:112345 Order OrderID : OrderID 1 1..* Orderline OrderLineID : OrderLineID OrderID : OrderID Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Customer : CustID Product : ProdID OrderLine:112346 Amount : Euro Created : DateTime Paid : DateTime Completed : DateTime NofItems : PosInt TotalWeight : Weight Entered : DateTime BackOrdered : DateTime Activity: enter order line Timestamp: 28-11-2011:08.14 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: create backorder Timestamp: 28-11-2011:08.55 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: secure order line Timestamp: 30-11-2011:09.06 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Secured : DateTime DelID : DelID 1..* 0..1 OrderLine:112347 Activity: enter order line Timestamp: 28-11-2011:08.15 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Attempt DelID : DelID 0..* 1 Delivery DelID : DelID Delivery:882345 When : DateTime DelAddress : Address Successful : Bool Contact : PhoneNo Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 19

Order:91245 Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 OrderLine:112345 Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 OrderLine:112346 Activity: enter order line Timestamp: 28-11-2011:08.14 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: create backorder Timestamp: 28-11-2011:08.55 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 Activity: secure order line Timestamp: 30-11-2011:09.06 OrderLineID: 112346 Product: ipod nano NofItems: 2 TotalWeight: 0.300 DellID: 882346 OrderLine:112347 Activity: enter order line Timestamp: 28-11-2011:08.15 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Activity: secure order line Timestamp: 29-11-2011:10.06 OrderLineID: 112347 Product: ipod classic NofItems: 1 TotalWeight: 0.200 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 Delivery:882346 Attempt:882346-1 Activity: delivery attempt Timestamp: 05-12-2011:08.43 DellID: 882346 Successful: true DelAddress: 5513XG-45 Contact: 040-2298761 PAGE 20

Orderline Orderline instance Order OrderID : OrderID Customer : CustID Amount : Euro 1 1..* OrderLineID : OrderLineID OrderID : OrderID Product : ProdID NofItems : PosInt Created : DateTime TotalWeight : Weight Paid : DateTime Entered : DateTime Completed : DateTime BackOrdered : DateTime Secured : DateTime DelID : DelID 1..* OrderLine:112345 Case id: 112345 Activity: enter order line Timestamp: 28-11-2011:08.13 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Case id: 112345 Activity: secure order line Timestamp: 28-11-2011:08.55 OrderLineID: 112345 Product: iphone 4G NofItems: 1 TotalWeight: 0.250 Attempt DelID : DelID When : DateTime Successful : Bool 0..* 1 0..1 Delivery DelID : DelID DelAddress : Address Contact : PhoneNo Order:91245 Case id: 112345 Activity: create order Timestamp: 28-11-2011:08.12 Customer: John Amount: 100 Case id: 112345 Activity: pay order Timestamp: 02-12-2011:13.45 Customer: John Amount: 100 Case id: 112345 Activity: complete order Timestamp: 05-12-2011:11.33 Customer: John Amount: 100 Delivery:882345 Attempt:882345-1 Attempt:882345-2 Attempt:882345-3 Case id: 112345 Activity: delivery attempt Timestamp: 05-12-2011:08.55 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Case id: 112345 Activity: delivery attempt Timestamp: 06-12-2011:09.12 Successful: false DelAddress: 5513VJ-22a Contact: 0497-2553660 Case id: 112345 Activity: delivery attempt Timestamp: 07-12-2011:08.56 Successful: true DelAddress: 5513VJ-22a Contact: 0497-2553660 PAGE 21

Other examples The life cycles of reviewers, authors, papers, reviews, PC chairs, etc. The life cycles of job applications and vacancies. X-ray machine logs: machine, machine day, patient, treatment, routine, etc.? Therefore, the selection and scoping of instances is needed. Like making deciding on the elements to be put on map; there may be many maps covering partially overlapping areas. PAGE 22

Extracting event logs Not just a syntactical issue. Different views are possible. Important: Selecting the right instance notion. Ordering of events. Selection of events. Proclets: the true fabric of real-life processes. PAGE 23