Data Science Research Theme: Process Mining Process mining is a relatively young research discipline that sits between computational intelligence and data mining on the one hand and process modeling and analysis on the other. The idea of process mining is to discover, monitor, and improve real processes (not assumed processes) by extracting knowledge from event logs readily available in today s information systems. Process mining research is one of the cornerstones of NIRICT s Data nalytics for Smart Services (DSS) initiative. True Business Intelligence The term business intelligence (BI) refers to a broad collection of tools and methods that use data to support decision making. BI is, unfortunately, an oxymoron in many companies, which use primitive tools to monitor and analyze processes. Moreover, most BI vendors offer products that are data-centric and focus on rather simplistic forms of analysis, such as dashboards and scorecards. Mainstream BI tools are not as intelligent as the term suggests. End-users are easily confused by the marking terms used by BI vendors. Nevertheless, the market for BI products is steadily growing, showing BI s practical relevance. Within DSS we aim to develop truly intelligent approaches to unlock the intelligence that is hidden inside the massive amounts of data that are being created and generated in the business and social processes of our current and future networked world. Process mining techniques can be used to achieve this and thus facilitate the engineering of smart services for the networked world. Research: automated process discovery (extracting process models from an event log) conformance checking (monitoring deviations by comparing model and log) social network and organizational mining automated construction of simulation models model extension and repair case prediction history-based recommendations
Starting point is an event log. Each event refers to a process instance (case) and an activity. Events are ordered and additional properties (e.g. timestamp or resource data) may be present. The event log can be used to discover roles in the organization (e.g., groups of people with similar work patterns). These roles can be used to relate individuals and activities. Role : ssistant Pete Mike Role E: Expert Sue Sean Role M: Manager Sara Decision rules (e.g., a decision tree based on data known at the time a particular choice was made) can be learned from the event log and used to annotate decisions. Ellen E examine thoroughly start register examine casually check ticket M M reinitiate decide pay compensation reject end Discovery techniques can be used to find a controlflow model (in this case in terms of a BPMN model) that describes the observed behavior best. Performance information (e.g., the average time between two subsequent activities) can be extracted from the event log and visualized on top of the model. The figure illustrates the scope of process mining. The starting point for process mining is an event log. ll process-mining techniques assume that it s possible to sequentially record events such that each event refers to an activity (a well-defined step in some process) and is related to a particular case, or process instance. Event logs can store additional information about events. In fact, whenever possible, processmining techniques use extra information such as the resource (person or device) executing or initiating the activity, the timestamp of the event, or data elements recorded with the event (for instance, the size of an order). We can use event logs to conduct three types of process mining. The first type is discovery. discovery technique produces a model from an event log without using any a-priori information. Process discovery is the most prominent process-mining technique. In many organizations, people are surprised to see that existing techniques are indeed able to discover real processes merely based on example executions in event logs. The second type of process mining is conformance. Here, an existing process model is
compared with an event log of the same process. For example, there are various algorithms to compute the percentage of events that can be explained by the model. Conformance checking can confirm whether reality, as recorded in the log, conforms to the model and vice versa. The third type of process mining is enhancement. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. For instance, by using timestamps in the event log, you can extend the model to show bottlenecks, service levels, throughput times, and frequencies. pplications and Relevance Process mining techniques can be applied in all top sectors identified by the Dutch government. Event data are available in most organizations and in all sectors there is a continuous need to improve and adapt operational processes. Examples are: Top Sector High Tech Systems. Most high-tech systems (wafer steppers, medical equipment, etc.) are already recording events for remote diagnostics and servicing. Process mining can be used to understand how systems are used in the field, why and when they fail, and how they can be improved, etc. Top Sector Logistics. Event data around the physical movement of goods can come from different data sources. Tagging of products (e.g. RFID) and integrated supply chains are generating torrents of data that can be used to improve processes from source to sink. Top Sector Health. There is a need to reduce costs in care processes. Today s hospitals and other care providers collect detailed data about individuals. This can be used to optimize care, both in terms of quality and costs. Challenges Process mining is an important tool for modern organizations that must manage nontrivial operational processes. On the one hand, more and more event data are becoming available. On the other hand, processes and information must be aligned perfectly to meet compliance, efficiency, and customer service requirements. Despite the applicability of process mining, we must still address important challenges; these illustrate that process mining is an emerging discipline. For example, the Process Mining Manifesto lists the following challenges: C1: Finding, merging, and cleaning event data C2: Dealing with complex event logs having diverse characteristics C3: Creating representative benchmarks C4: Dealing with concept drift C5: Improving the representational bias used for process discovery When extracting event data suitable for process mining, we must address several challenges: data can be distributed over a variety of sources, event data might be incomplete, an event log could contain outliers, logs could contain events at different level of granularity, and so on. Event logs can have very different characteristics. Some event logs might be extremely large, making them difficult to handle, whereas others are so small that they don t provide enough data to make reliable conclusions. We need good benchmarks consisting of example data sets and representative quality criteria to compare and improve the various tools and algorithms. The process might be changing while under analysis. Understanding such concept drifts is of prime importance for process management. careful and refined selection of the representational bias is necessary to ensure high-quality process-mining results. C6: Balancing Four competing quality dimensions exist: fitness, simplicity, precision, and
between quality criteria such as fitness, simplicity, precision, and generalization C7: Crossorganizational mining C8: Providing operational support C9: Combining process mining with other types of analysis C10: Improving usability for nonexperts C11: Improving understandability for non-experts generalization. The challenge is to find models that can balance all four dimensions. In some use cases, event logs from multiple organizations are available for analysis. Some organizations, such as supply chain partners, work together to handle process instances; other organizations execute essentially the same process while sharing experiences, knowledge, or a common infrastructure. However, traditional process-mining techniques typically consider one event log in one organization. Process mining isn t restricted to offline analysis; it can also provide online operational support. Detection, prediction, and recommendation are examples of operational support activities. The challenge is to combine automated process-mining techniques with other analysis approaches (optimization techniques, data mining, simulation, visual analytics, and so on) to extract more insights from event data. The challenge is to hide the sophisticated process-mining algorithms behind userfriendly interfaces that automatically set parameters and suggest suitable types of analysis. The user might have problems understanding the output or be tempted to infer incorrect conclusions. To avoid such problems, process mining tools should present results using a suitable representation and the trustworthiness of the results should always be clearly indicated. s an example, consider Challenge C4: Dealing with Concept Drift. The term concept drift refers to a situation in which the process is changing while we re analyzing it. For instance, in the beginning of the event log, two activities might be concurrent, whereas later in the log, they become sequential. Processes might change because of periodic or seasonal changes (for example, in December, there is more demand or on Friday afternoon, fewer employees are available ) or changing conditions ( the market is getting more competitive ). Such changes impact processes, and detecting and analyzing them is vital. However, most process-mining techniques analyze processes as if they re in steady state. Impact Based on the research conducted within NIRICT, powerful process mining tools have been developed. For example, the open-source ProM tool developed at Eindhoven University of Technology provides a highperforming pluggable architecture and a common basis for all kinds of process-mining techniques. Hundreds of plugins are available; for instance, ProM supports dozens of process-discovery algorithms as plugins. ProM is available for download from prom.sf.net and www.processmining.org. lso various commercial process mining tools have been developed based on the research done at Eindhoven University of Technology, e.g., Disco by Fluxicon, Reflect by Perepetive, etc. ProM has been applied in more than 100 organizations, including municipalities such as lkmaar, Heusden, and Harderwijk; government agencies such as Rijkswaterstaat, Centraal Justitieel Incasso Bureau, and the Dutch Justice department insurance-related agencies such as UWV; banks such as ING; hospitals such as MC and Catharina hospitals; multinational corporations such as DSM and Deloitte;
high-tech system manufacturers, such as Philips Healthcare, SML, Ricoh, and Thales, and their customers; and media companies such as Winkwaves. This illustrates the broad spectrum of situations to which we can apply process mining. More information For more information about process mining visit www.processmining.org or read the book W. van der alst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, Springer-Verlag, 2011 (http://springer.com/978-3-642-19344-6). The website provides sample logs, videos, slides, articles, and software. The Process Manifesto can be found on the home page of the IEEE Task Force on Process Mining: http://www.win.tue.nl/ieeetfpm/. The Task Force was established by the IEEE to promote the research, development, education and understanding of process mining.