Data Warehouse Model for Audit Trail Analysis in Workflows



Similar documents
Semantic Analysis of Business Process Executions

An Ontology-based Framework for Enriching Event-log Data

Audit Trail Analysis for Traffic Intensive Web Application

Business Process Intelligence

Data Warehouse Snowflake Design and Performance Considerations in Business Analytics

Supporting the Workflow Management System Development Process with YAWL

Process Modelling from Insurance Event Log

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Business Processes Meet Operational Business Intelligence

How To Write A Diagram

Lection 3-4 WAREHOUSING

Improving Business Process Quality through Exception Understanding, Prediction, and Prevention

Sterling Business Intelligence

BUILDING OLAP TOOLS OVER LARGE DATABASES

Data Warehousing Systems: Foundations and Architectures

Data Warehousing and Data Mining in Business Applications

Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach

An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of

Towards a Corporate Performance Measurement System

Workflow Object Driven Model

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology

Data W a Ware r house house and and OLAP Week 5 1

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 28

Time Patterns in Workflow Management Systems

Designing a Dimensional Model

14. Data Warehousing & Data Mining

B.Sc (Computer Science) Database Management Systems UNIT-V

SQL Server 2012 Business Intelligence Boot Camp

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

A Case Study in Integrated Quality Assurance for Performance Management Systems

A Metadata-based Approach to Leveraging the Information Supply of Business Intelligence Systems

Basics of Dimensional Modeling

A Brief Tutorial on Database Queries, Data Mining, and OLAP

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Optimization of ETL Work Flow in Data Warehouse

An Evaluation of Process Warehousing Approaches for Business Process Analysis

LEARNING SOLUTIONS website milner.com/learning phone

DATA WAREHOUSING AND OLAP TECHNOLOGY

Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.

A Service-oriented Architecture for Business Intelligence

Lecture 12: Entity Relationship Modelling

Top 10 Business Intelligence (BI) Requirements Analysis Questions

Methodology Framework for Analysis and Design of Business Intelligence Systems

FlowSpy: : exploring Activity-Execution Patterns from Business Processes

Design of a Multi Dimensional Database for the Archimed DataWarehouse


BUILDING A WEB-ENABLED DATA WAREHOUSE FOR DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT

Dimensional Data Modeling for the Data Warehouse

COMPUTER AUTOMATION OF BUSINESS PROCESSES T. Stoilov, K. Stoilova

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

BIS 3106: Business Process Management. Lecture Two: Modelling the Control-flow Perspective

Business Process Intelligence

Fluency With Information Technology CSE100/IMT100

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

University Data Warehouse Design Issues: A Case Study

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

Dimensional Modeling for Data Warehouse

Cúram Business Intelligence Reporting Developer Guide

The Benefits of Data Modeling in Business Intelligence

The Oracle Enterprise Data Warehouse (EDW)

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Extending UML 2 Activity Diagrams with Business Intelligence Objects *

A Design and implementation of a data warehouse for research administration universities

Multi-Paradigm Process Management

Building a Data Warehouse

WebSphere Business Monitor

Using the column oriented NoSQL model for implementing big data warehouses

Foundations of Business Intelligence: Databases and Information Management

Process Information Factory: A Data Management Approach for Enhancing Business Process Intelligence

Data Warehouse - Basic Concepts

The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija,

Implementing a Data Warehouse with Microsoft SQL Server 2014

Data Modeling Basics

Dx and Microsoft: A Case Study in Data Aggregation

Deductive Data Warehouses and Aggregate (Derived) Tables

AMFIBIA: A Meta-Model for the Integration of Business Process Modelling Aspects

A Software Framework for Risk-Aware Business Process Management

Data Integration and ETL Process

Object Oriented Programming. Risk Management

Simulation-based Evaluation of Workflow Escalation Strategies

Implementing a Data Warehouse with Microsoft SQL Server

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Analysis and Implementation of Workflowbased Supply Chain Management System

CHAPTER 4: BUSINESS ANALYTICS

Unit 5: Object-Role Modeling (ORM)

HETEROGENEOUS DATA TRANSFORMING INTO DATA WAREHOUSES AND THEIR USE IN THE MANAGEMENT OF PROCESSES

An Overview of Data Warehousing, Data mining, OLAP and OLTP Technologies

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Building Data Warehousing and Data Mining from Course Management Systems: A Case Study of FUTA Course Management Information Systems

Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses

The Quality Data Warehouse: Solving Problems for the Enterprise

OLAP Theory-English version

Foundations of Business Intelligence: Databases and Information Management

Complex Information Management Using a Framework Supported by ECA Rules in XML

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Deriving Business Intelligence from Unstructured Data

Transcription:

Data Warehouse Model for Audit Trail Analysis in Workflows Kin-Chan Pau, Yain-Whar Si Faculty of Science and Technology University of Macau {ma36533, fstasp}@umac.mo Marlon Dumas Faculty of Information Technology Queensland University of Technology, Australia m.dumas@qut.edu.au Abstract - Business process performance evaluation is a key step towards assessing and improving e-business operations. In real-scale scenarios, such evaluation requires the collection, aggregation and processing of vast amounts of data, in particular audit trails. This paper aims at enabling such evaluation by integrating workflow technology with data warehousing. We first present a data model for capturing workflow audit trail data relevant to process performance evaluation. We then construct logical models that characterize the derivation of performance evaluation data from workflow audit trails. Based on these models, we apply dimensional modeling techniques to define schemas for storing workflow audit trail data in data warehouses. Using data warehouse technology, decision makers are able to query large volumes of audit trail data for business process performance evaluation. I. INTRODUCTION Workflow Management Systems (WfMS) enable the automation of business operations by allocating and dispatching work to users according to executable process models [1]. In traditional applications, coordination logic is embedded within application system code and thus difficult to be changed. The aim of workflow management is to separate coordination logic from application logic. Thanks to this separation, businesses can design highly configurable applications. Audit trail [2] is an electronic archive for recording the history of workflow. During the execution, relevant information related to resource allocation and status of executed work items are logged in audit trails. This information can then be analyzed for business process performance evaluation and strategic decision making. A typical WfMS stores audit trail data in a database system. This approach does not scale for large-scale analytic processing, especially as the number of transactions increases and complex queries involving grouping operators need to be evaluated concurrently with day-to-day transaction processing. As a first step towards addressing this problem, we propose a data model that captures workflow audit trail data relevant for process performance evaluation. Based on this model, we then define schemas for storing these data in a data warehouse. A data warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management s decision [5]. Data warehouses are mainly used by managers and decision makers to extract information quickly and conveniently in order to answer questions about their business. One of the techniques to design data warehouse is dimensional modeling [6]. By using this technique, we can optimize query evaluation. By storing workflow audit trails in a data warehouse, multi-dimensional analysis of business process execution can be undertaken. We demonstrate this concept by means of sample performance evaluation queries on a workflow-enabled e-business scenario. A generic model of audit trail data is described in section II. Process performance evaluation based on a logical model is discussed in section III. The design of the data warehouse is given in section IV. Section V illustrates the opportunities opened by the proposal on a case study. In section VI we briefly review related work before summarizing our ideas in section VII. II. MODELS FOR WORKFLOW AUDIT TRAILS Below we present a conceptual model for audit trails using Object Role Modeling (ORM) [7]. ORM is a method for designing and querying databases at the conceptual level. An ORM model comprises a set of entity types and a set of relationships. Entity types are depicted as named ellipses and relationships (also called predicates) are shown as a concatenation of one or more boxes, each of which denotes a role played by an entity type in the relationship. Relationship names are read left-to-right, unless prepended by << in which case they are read in the opposite direction. Lines connect entity types to the roles they play. Arrow-tipped bars denote the key of a relationship (i.e. a uniqueness constraint). A black dot indicates a mandatory role, i.e. at least one object of the corresponding object type must play that role. Based on ORM, we outline three conceptual designs of the relevant information on the workflow audit trail data. The information related to workflow definition and workflow instance is depicted in Fig. 1. A workflow definition (also called process definition) may contain a set of activity definition and vice versa, an activity definition may be associated with different process definitions. The predicate contains between process and activity entity type is a binary predicate corresponding to a many-to-many relationship. This is denoted by an arrow tipped bar placed over the two boxes representing the

contains predicate in Fig. 1. Both process and activity entity types are subtypes of definition. Subtyping is denoted by solid arrows in Fig. 1. A process definition may be composed of other process definitions. This role is depicted using parent of relation in Fig. 1. During execution, a process instance may be instantiated by the WfMS based on a process definition. More than one process instance of a process may be instantiated. This is denoted by an arrow-tipped bar placed over the right box in Fig. 1. In some cases, a process instance may become a parent when it triggers other child process instance to be instantiated. According to the activity definition, activity instances will be created. Each activity instance belongs to exactly one process instance. At runtime, an activity is represented as a work-item. Then, the workitem will be allocated to a workflow participant at a particular time instant. The process instances, activity instances and workitems are subtypes of workflow object. In a similar way, we conceptualize the basic elements of a workflow object and the information related to workflow event in Fig. 2 and 3. Fig. 1. Meta model of workflow definition and case Fig. 2. Meta model of workflow object III. WORKFLOW EVALUATION Business process performance can be measured in terms of (1) a workflow s temporal properties and its status, (2) performance of workflow participants, and (3) events occurred (or triggered) during a workflow execution. In this section, we construct logical models for workflow evaluation using ADAPT (Application Design for Analytical Processing Technologies [8]). ADAPT can be used during the design phase of multidimensional databases to provide an abstract view of the business requirements by capturing the relationship between the core data that is to be analyzed and the indexes for accessing the data. In addition, ADAPT can also be used to show how certain data be derived from other information. Such derivation can be expressed in neither in ER (Entity Relationship) or in dimensional modeling. Logical models are useful in capturing business requirements without being constrained by underlying data management systems such as traditional relational databases or data warehouses. Therefore, the decision to choose any particular data management systems can be postponed to a later stage. Evaluation on workflow execution performance primarily concerns the data collected during the execution of process instances and activities instances. In ADAPT, the basic building blocks are hypercube, dimension and measure. Hypercube is the core data unit that needs to be analyzed. In Fig. 4, a hypercube Workflow Execution is denoted by a cube notation. Each hypercube may connect to one or more dimensions. In a logical model, hypercubes are always placed in the central position and is usually surrounded by their dimensions and measures. Dimension is a collection of related values used as indexes for accessing the data in hypercube. For example, in Fig. 4, there are five dimensions (Definition, Workflow Object, Business Object, Create Time, and finish Time) depicted in rectangles with 3-dimensional axis icons. A measure contains critical factors for analyzing data in a hypercube. In Fig. 4, a measure is depicted by a small ruler embedded in a rectangle. Those indicators can be derived from members such as Actual Execution Time, Idle Time, Lifetime of Workflow Object, Complete, and Successful. Lifetime of Workflow Object is a derivation of an algebraic process based on other members. This algebraic process is called a Model in ADAPT and depicted in a parallelogram with f( ). In a similar way, we define the logical models of workflow participants performance and workflow events in Fig. 5 and 6. Based on these logical models, we outline the schemas for data warehouse design in the following section. Fig. 3. Meta model of workflow event

Fig. 4. Logical model of workflow execution performance Fig. 5. Logical model of workflow participants performance Fig. 6. Logical model of workflow events IV. DATA WAREHOUSE DESIGN Data warehouse s designs are basically different from common Relational Database designs. In Data warehouse design, dimensional modeling is used to improve the query performance by allowing redundancy in data normalization. By relying on such designs, data warehouses present data in structures which are consistent and intuitive. Data warehouse designs are based on a kind of multidimensional modeling called star schema which reflects exactly how users normally view their critical measures along their business dimensions [6]. Star schema typically consists of a fact table in the middle of a schema diagram and surrounded by a set of dimension tables. Fact tables are structured to define business metrics or measurement. They are used to store granular data over time at lowest level of the dimension hierarchies. Dimension tables are structured to define business dimensions and they are collections of related values used as indexes for accessing the data in the fact tables. Dimension tables may contain data which is not going to be changed in future and high-level information which are usually rich in detail [5]. In Fig. 7, 8, and 9, we specify the star schemas models for storing audit trail in a data warehouse. In general, each of the hypercube from section III is represented as a fact table which is surrounded by their corresponding dimension tables. For example, based on the logical model (Evaluation on workflow execution performance) defined in Section III, we define the Workflow Execution as a fact table and surrounded by four dimension tables; Definition, Workflow Object, Time, and Business Obj Bridge. This situation is depicted in Fig. 7. Each fact table contains number of foreign key relationships to its dimension tables and its derivable numeric measure factors such as Actual Execution Time, Idle Time, Lifetime of Workflow Object, Complete, and Successful. In fact, these derivable measures are based on the members from the logical models of Section III. For example, in Fig. 7, definition_key in Workflow Execution fact table is the foreign key of Definition dimension table. In a star schema, each dimension table contains not only its primary key but also some attributes that may be used to retrieve the data from the fact table based on certain constraints. For example, definition_key is the primary key and definition_type is one of the attributes of Definition dimension table. In a similar way, based on the logical models defined from Fig. 5 and 6, we define the star schemas for the workflow participants performance and workflow events in Fig. 8 and 9. In the below star schemas, the Time dimension table is used to realize (replace) the dimensions described in the three logical models (Fig. 4, 5, and 6) such as Create Time, Finish Time, Allocated Time and Timestamp. In case if a user wishes to query these dimensions as described in the logical models, a number of synonym dimension tables can be created on top of the proposed star schemas [9]. Note that the Business Object Attr dimension table in the below schemas is different from other dimension tables such as Definition, and Time since it does not have a one-to-many relationship with the fact tables (the tables in the center of the star schemas). Since the number of attributes in each business object is unknown, it is impossible to pre-define the number of elements in Business Object Attr dimension table. This kind of dimension table is called a multi-valued dimension table and, in usual practice, a bridge table is used to link the multi-value dimension table with the fact table [10]. As a result, in the below schemas, a bridge table called Business Obj Bridge is used to link Business Object Attr dimension table with the fact tables. Such linking allows many-to-many relationship between the dimension tables and the fact tables. Primary key of the Business Obj Bridge consists of Task Execution and Business Object Attr foreign keys. The number of records in the bridge table will depend on the number of attributes in the corresponding business object.

Definition definition_key definition_type definition_id definition_name Workflow Object wf_obj_key wf_obj_type process_instance_id activity_instance_id workitem_id Workflow Execution definition_key wf_obj_key create_time_key finish_time_key bus_obj_bridge_key actual_exe_time idle_time lifetime_of_wf_obj complete sccuessful Time time_key time date month quarter year holiday_flag weekday_flag Business Obj Bridge bus_obj_bridge_key bus_obj_attr_key Business Obj Attr bus_obj_attr_key data_name data_type data_value Fig. 7. Star schema of workflow execution performance Fig. 8. Star schema of workflow participants performance V. CASE STUDY In this section, we outline a sample performance analysis on an order fulfillment process workflow in a typical e- businesses scenario. Order fulfillment covers the period from the generation of a customer order till the delivery of requested items. In this process, a customer can place an order with a retailer for purchasing a list of items. In response to the order, the retailer checks whether the items are available in the inventory. If they are not available, the retailer will try to locate suitable suppliers which can provide items with a short lead time. The retailer will make backorders if the suppliers are found. Otherwise, the retailer rejects the customer order. If the items requested are available in the inventory or can be backordered, the retailer accepts the customer order. After that, the retailer prepares an invoice for billing and packs the requested items for shipping. Once the payment has been received from the customer, the retailer ships the items to the customer. In Fig. 10, we formalize the workflow model of the order fulfillment process using Petri nets [2]. In Petri nets, we can describe a process in terms of places, transitions, and arcs. Places are indicated as circles, transitions are indicated as rectangles. Places and transitions can be linked by directed arcs. Places are passive components and they are used to represent the states of processes. Places may contain tokens which are indicated by black dots. Next, the order fulfillment process in Petri nets model is mapped into workflow definition. Recall that in Fig. 7, 8, and 9, we specify star schema models for storing audit trail in a data warehouse. Based on the audit trail data of Order Fulfillment process, we are now able to formulate queries which can reveal the performance of the workflow. Due to the space limitation, we have chosen three quires for discussion. Customer satisfaction is one of the crucial performance indicators in a business process. Customer satisfaction can be defined in terms of service level and cycle time. Service level is defined as the rate of the customer orders which can be delivered on time. Cycle time is defined as the total elapsed time from the receipt of a customer order till the delivery of requested items. We define the following query based on Fig. 7. Fig. 9. Star schema of workflow events Fig. 10. Workflow model of Order Fulfillment

Query 1: Derive the average cycle time and the rate of service level for each calendar year in the Order Fulfillment Process. SELECT FT.year, AVG(WE.lifetime_of_wf_obj), SUM(WE.successful)/COUNT(WE.successful) FROM WorkflowExecution WE JOIN Definition D USING (definition_key) JOIN FinishTime FT USING (finish_time_key) WHERE D.type = Process AND D.name = Order Fulfillment GROUP BY FT.year It is crucial that the performance of new recruits in a business process be constantly monitored to evaluate the effectiveness of training. For instance, a number of trainees are assigned to the packing department and their performance is reviewed every quarter. An employees performance can be defined in terms of their actual execution time for the allocated tasks. We define the following query based on Fig. 8. Query 2: What is the monthly average actual execution time of each staff in the packaging department in second quarter of current calendar year? SELECT P.username, AT.month, AVG(TE. actual_exe_time) FROM TaskExecution TE JOIN Definition D USING (definition_key) JOIN Participant P USING (participant_key) JOIN AllocatedTime AT USING (allocated_time_key) WHERE D.type = Activity AND D.name = Pack Items AND P.department = Package Department AND AT.year = 2007 AND AT.quarter = Second GROUP BY P.username, AT.month Delivery is one of the crucial tasks in the order fulfillment process. The delivery task from Fig. 10 is considered to be inefficient if it spends much of its time in the suspended state. We define the following query based on Fig. 9. Query 3: What is the total suspended time of the delivery task in each month of the current calendar year? SELECT TS.month, SUM(WE.state_duration) FROM WorkflowEvent WE JOIN Definition D USING (definition_key) JOIN State S USING (state_key) JOIN Timestamp TS USING (timestamp_key) AND D.type = Activity AND D.name = Delivery AND S.state_type= suspended AND TS.year = 2007 GROUP BY TS.month VI. RELATED WORK The Workflow Management Coalition (WfMC) [3] defined a set of standard interfaces for data interchange across workflow management systems. One of these interfaces, namely Interface 5 [4], contains a collection of entity types and attributes to represent workflow audit trails at a detailed level. However, these detailed models do not capture the relations between entity types, as we do in this paper. Also, WfMC s interface 5 is not concerned with deriving data warehouse schema. Bonifati et al. [11] propose a Workflow Data Warehouse (WDW) solution for collecting and analyzing audit trail data of HP Process Manager (HPPM). The difference between their approach and ours is that, in [11], the design of WDW schema is specifically tailored to store HPPM audit trail data, whereas in our approach, the data warehouse schema is designed more generically. Grigori et al. [12] propose a Business Process Intelligence (BPI) architecture that supports business and IT users in managing process execution quality. It describes how analysis, interpretation and optimization of business processes can be done by storing WfMS audit trail data in the Process Data Warehouse (PDW) component. The difference between their approach and ours is that, in BPI architecture, they focus on the design of PDW loader architecture which is used to collects data from workflow logs. The design of a complete data warehouse model for storing WfMS audit trail data is not described in [12]. Eder et al. [13] propose a data warehouse design for monitoring the execution of business processes for process improvement. They introduce a logical data warehouse model to store workflow logs. The difference between their approach and ours is that their data warehouse model does not incorporate business object attributes and workflow events, which are crucial when evaluating business process performance. VII. CONCLUSION Our main contributions in this work are (1) conceptualization of key elements in workflow audit trail data that are relevant to the evaluation of business process performance, (2) design of logical models which are used to capture the derivation processes for evaluation data, and (3) design of physical models using dimensional modeling technique to store audit trail data in data warehouse so as to optimize query performance across large volume of audit tail data. Based on these models, analysts can evaluate the quality of business processes by monitoring the performance of the operations and by uncovering weaknesses hidden in the workflow models. From such analysis, managers and decision makers can improve existing workflow models and can optimize the performance of business processes. Future work will aim at validating the proposed models by embodying them in a prototype implementation. In particular, we plan to extend an open-source WfMS, namely YAWL [14], and to link it with a data warehouse. REFERENCES [1] Workflow Management Coalition. Workflow Reference Model, January 1995. [2] W. M. P. van der Aalst and K. M. van Hee, Workflow Management. Models, Methods, and Systems. MIT Press, Cambridge, 2002.

[3] Workflow Management Coalition, http://www.wfmc.org/. [4] Workflow Management Coalition. Audit Data Specification, Document Number WFMC-TC-1015, version 1.1, September 1998. [5] W. Inmon. Building the Data Warehouse, John Wiley and Sons, 1993. [6] P. Ponniah. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals, Wiley-Interscience, 2001. [7] T. Halpin. Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann Publication, 2001. [8] Symmetry Corp. Getting Started with ADAPT, http://www.symcorp.com/downloads/adapt_white_paper.pdf, last accessed on 2007-05-08. [9] R. Kimball. The Data Warehouse Toolkit: Practical Techniques for. Building Dimensional Data Warehouses, John Wiley & Sons, 1996. [10] R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley & Sons, 2002. [11] A. Bonifati, F. Casati, U. Dayal, and M. C. Shan. Warehousing workflow data: Challenges and Opportunities, In Proceedings of the 27th VLDB Conference, Roma, Italy, 2001. [12] D. Grigori, F. Casati, M. Castellanos, U. Dayal, M. Sayal, and M. C. Shan. Business process intelligence, Computers in Industry, 53(3):321 343, 2004. [13] J. Eder, G.E. Olivotto and W. Gruber. A data warehouse for workflow logs. In Proceedings of the First International Conference on Engineering and Deployment of Cooperative Information Systems, pp.1-15, 2002. [14] W. van der Aalst and A. ter Hofstede. YAWL: Yet Another Workflow Language. Information Systems 30(4):245-275, 2005.