Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities

Similar documents
SOFTWARE PROCESS MINING

A Visualization Approach for Bug Reports in Software Systems

Process Mining Framework for Software Processes

Model Discovery from Motor Claim Process Using Process Mining Technique

Mining Peer Code Review System for Computing Effort and Contribution Metrics for Patch Reviewers

BUsiness process mining, or process mining in a short

Dotted Chart and Control-Flow Analysis for a Loan Application Process

Process Modelling from Insurance Event Log

BPIC 2014: Insights from the Analysis of Rabobank Service Desk Processes

MIMANSA: Process Mining Software Repositories from Student Projects in an Undergraduate Software Engineering Course

Rabobank: Incident and change process analysis

Activity Mining for Discovering Software Process Models

Feature. Applications of Business Process Analytics and Mining for Internal Control. World

Mercy Health System. St. Louis, MO. Process Mining of Clinical Workflows for Quality and Process Improvement

Process Mining. ^J Springer. Discovery, Conformance and Enhancement of Business Processes. Wil M.R van der Aalst Q UNIVERS1TAT.

Integrating Service Oriented MSR Framework and Google Chart Tools for Visualizing Software Evolution

Process mining challenges in hospital information systems

HP ALM11 & MS VS/TFS2010

Implementing Heuristic Miner for Different Types of Event Logs

ProM 6 Tutorial. H.M.W. (Eric) Verbeek mailto:h.m.w.verbeek@tue.nl R. P. Jagadeesh Chandra Bose mailto:j.c.b.rantham.prabhakara@tue.

CHAPTER 1 INTRODUCTION

Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization

Supporting the BPM lifecycle with FileNet

Agile Requirements Definition for Software Improvement and Maintenance in Open Source Software Development

Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization

Combination of Process Mining and Simulation Techniques for Business Process Redesign: A Methodological Approach

Process Analysis and Organizational Mining in Production Automation Systems Engineering

ESTABLISHING A MEASUREMENT PROGRAM

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

Theme 1 Software Processes. Software Configuration Management

Fundamentals of Business Process Management

Towards Patterns to Enhance the Communication in Distributed Software Development Environments

Process Mining in Big Data Scenario

Chapter 4 Getting the Data

Discovering User Communities in Large Event Logs

Title: Basic Concepts and Technologies for Business Process Management

Towards a Software Framework for Automatic Business Process Redesign Marwa M.Essam 1, Selma Limam Mansar 2 1

Analysis of Service Level Agreements using Process Mining techniques

Replaying History. prof.dr.ir. Wil van der Aalst

Process Mining Tools: A Comparative Analysis

Managing and Tracing the Traversal of Process Clouds with Templates, Agendas and Artifacts

Process Mining A Comparative Study

Context Capture in Software Development

Chapter 12 Analyzing Spaghetti Processes

Using Process Mining to Bridge the Gap between BI and BPM

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Automating the Measurement of Open Source Projects

Software Configuration Management Plan

QA & Test Management. Overview.

Eventifier: Extracting Process Execution Logs from Operational Databases

Introduction to Programming Tools. Anjana & Shankar September,2010

A Research Article on Data Mining in Addition to Process Mining: Similarities and Dissimilarities

Process Mining: Using CPN Tools to Create Test Logs for Mining Algorithms

Database Marketing, Business Intelligence and Knowledge Discovery

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

Relational XES: Data Management for Process Mining

Software Configuration Management. Slides derived from Dr. Sara Stoecklin s notes and various web sources.

Barely Sufficient Software Engineering: 10 Practices to Improve Your Research CSE Software

Merging Event Logs with Many to Many Relationships

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

Your Software Quality is Our Business. INDEPENDENT VERIFICATION AND VALIDATION (IV&V) WHITE PAPER Prepared by Adnet, Inc.

Do Onboarding Programs Work?

Obtaining Enterprise Cybersituational

How To Write Software

Data Science. Research Theme: Process Mining

A Qualitative Study on Performance Bugs

Towards Cross-Organizational Process Mining in Collections of Process Models and their Executions

Pattern Insight Clone Detection

Engineering Object Change Management Process Observation in Distributed Automation Systems Projects

EDIminer: A Toolset for Process Mining from EDI Messages

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0 Service Pack: 2. Feature and Technical Overview

3TU.BSR: Big Software on the Run

Business Process Modeling

The Importance of Bug Reports and Triaging in Collaborational Design

Processing and data collection of program structures in open source repositories

ProM Framework Tutorial

Supporting the BPM life-cycle with FileNet

ERP Event Log Preprocessing: Timestamps vs. Accounting Logic

Configuring IBM WebSphere Monitor for Process Mining

PAID VS. VOLUNTEER WORK IN OPEN SOURCE

Requirements Engineering

Transcription:

Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities Monika Gupta monikag@iiitd.ac.in PhD Advisor: Dr. Ashish Sureka Industry Mentor: Dr. Srinivas Padmanabhuni Indraprastha Institute of Information Technology New Delhi, India 1

Presentation Outline Research Motivation Research Aim Related Work Related Methodology and Proposed Contributions Data Source Technique and Contributions Evaluation Preliminary Results 2

Research Motivation Measure Process Mining PROCESS Improve Plan Process Analyst Analyze Software Repositories If one cannot measure it, one cannot improve it 3

Research Motivation Process Mining: Extract knowledge from event logs recorded by an information system [1]. Event logs (e.g. transaction logs) with four fields: CaseID Event Timestamp Resource --- --- --- --- Tools and framework for process mining: ProM 1 (Open Source) Disco 2 (Commercial) [1] van der Aalst, Wil MP, et al. "Business process mining: An industrial application." Information Systems 32.5 (2007): 713-732. 1. http://www.promtools.org/prom6/ 2. http://www.fluxicon.com/disco/ 4

Research Motivation Software Repositories: Artifacts generated by the tools during software evolution and archived for future reference. Rich data available. Uncover interesting and actionable information for process improvement. For example : Issue Tracking System (ITS), Version Control System, Code Review etc. 5

Research Aim Novel applications of process mining on software repositories for: Runtime Process Map Discovery Performance Analysis Conformance verification User behavior pattern investigation Enhancement of process mining capabilities of existing tools 6

Related Work Author Year Repository Objectives Rubin et al. 2007 Subversion logs of the ArgoUML project. (OSS) Akman et al. 2009 Software Configuration Management of industry project. Knab et al. 2010 ITS of EUREKA project SERIOUS (CSS). Poncin et al. 2011 amsn and GCC bug repositories, mail archives, SVM Linear Temporal Logic (LTL) Checking (Conformance Analysis), Social Network Discovery, Performance Analysis, Petri Net Discovery Analyzed and compared the effectiveness of four process discovery algorithms on software process. Analyzed discrepancies between real time and design time process. Interactive approach to visualize effort estimation and process lifecycle patterns in ITS to detect outliers, flaws and interesting properties. Combined different repositories for analysis using a prototype, FRASR. Role classification and Bug life cycle construction using ProM. Sunindyo et al. 2012 Red Hat Linux ITS Framework for collecting and analyzing data from bug reporting system, conformance checking to improve process quality. 7

Technical Challenges Mining from multiple perspectives Gathering data from heterogeneous sources Data incompatibility with process mining tools Mining hidden tasks Missing clear design process and goals 8

Research Methodology NIRIKSHAN (Sanskrit word which means to investigate ): Research framework showing proposed research approach for the research contributions followed by evaluation. 9

Research Methodology PRACTITIONER S SURVEY AND FIELD STUDY Validate initial hypothesis Verify in-practice process and policies Identify research problems DATA SOURCE Issue Tracking System like Bugzilla, JIRA, Mantis Peer Code Review System like Gerrit, Rietveld Version Control System like SVN and Mercurial Project Hosting Platform like Sorceforge, Git APIs, XML-RPC, JSON- RPC, Web Scraping Research Questions that usually process analysts have and can be answered by process mining Event Log Data 10

Research Methodology NIRIKSHAN (Sanskrit word which means to investigate ): Research framework showing proposed research approach for the research contributions followed by evaluation. 11

Research Methodology Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting 12

Research Methodology Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting 13

Research Methodology Technique And Contribution Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting [2] Eid-Sabbagh et al. "Business process architecture: use and correctness." Business Process Management. Springer Berlin Heidelberg, 2012. 14

Research Methodology Technique And Contribution Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting [2] Eid-Sabbagh et al. "Business process architecture: use and correctness." Business Process Management. Springer Berlin Heidelberg, 2012. 15

Research Methodology Technique And Contribution Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting 16

Conformance Verification Do we do what was agreed upon? Design Process Model v/s Discovered Runtime Process Model 17

Research Methodology Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting [3] Van Der Aalst, Wil MP, Hajo A. Reijers, and Minseok Song. "Discovering social networks from event logs., CSCW 2005, 549-593. 18

Research Methodology Runtime Process Model Discovery Single Repository MySQL RDBMS CaseID Timestamp Activity Actor PROCESS PERSPECTIVE ORGANIZATIONAL PERSPECTIVE Performance Analysis Conformance Verification Joint Cases, and Joint Activity Data Integration Event Log Handover, and Subcontracting [3] Van Der Aalst, Wil MP, Hajo A. Reijers, and Minseok Song. "Discovering social networks from event logs., CSCW 2005, 549-593. 19

Preliminary Results Process Mining Single Software Repository: Issue Tracking System [4] Event log data of ITS analyzed. Case study on open source Firefox browser and Core project. Discover process map, self-loops, back-forth, issue reopen. Algorithm for conformance verification. Insights: 15 activities in event log Component and developer reassignment frequent 3-4 events in lifecycle, few with >7 events 80% of the cases covered with 2% unique traces Total unique traces: Core-1164, Firefox-622 [4] Gupta, M., & Sureka, A. Nirikshan: Mining bug report history for discovering process maps, inefficiencies and inconsistencies. In seventh india software engineering conference (ISEC). 20

Preliminary Results Process Map for Firefox with labels as absolute frequency of transition and activity [4] Gupta, M., & Sureka, A. Nirikshan: Mining bug report history for discovering process maps, inefficiencies and inconsistencies. In seventh india software engineering conference (ISEC). 21

Preliminary Results Bottleneck: Worksforme, Wontfix state Reopened: Worksforme, Wontfix, Fixed Conformance: 0.86 for Core, 0.91 for Firefox Cause of inconsistency: Reported Assigned 22

Preliminary Results Process Mining Multiple Software Repositories: Issue Tracking System, Peer Code Review System and Version Control System [5] Issue resolution process starting from issue reporting in ITS, followed by patch submission to PCR for review and finally committed to VCS. Case study on open source Google Chromium project. Discover process map, anti-patterns, bottlenecks, organizational metrics. [5] Gupta, Monika, Ashish Sureka, and Srinivas Padmanabhuni. "Process mining multiple repositories for software defect resolution from control and organizational perspective." Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014. 23

ITS BUG ID Preliminary Results PEER CODE REVIEW SYSTEM PCR ISSUE ID ITS ISSUE ID VERSION CONTROL SYSTEM VCS REVISION ID PCR ISSUE ID codereview.chromium.org chromiumcodereview.appspot.com PCR ISSUE ID 24

Preliminary Results Resolution process efficient with high chances of issues getting Fixed Basic and Composite anti-patterns like loops, and information flow detected Bottlenecks are identified such as control transfer between ITS and PCR More social performers are more active Joint activities helps to identify generalists and specialists Same performer performs multiple subsequent activities 25

Research Methodology NIRIKSHAN (Sanskrit word which means to investigate ): Research framework showing proposed research approach for the research contributions followed by evaluation. 26

Research Methodology EVALUATION REAL DATA Open Source Projects Commercial Projects Process Mining Approach VALIDATION: Survey professionals Feedback Improve 27

Usefulness Visualize the process reality for better improvement Streamline the process Reduce flow time Manage changing workloads Efficient task allocation Better process maintainability, capability, reliability, efficiency and stability [6][7] 6. Patrick Knab, Martin Pinzger, and Harald C. Gall. Visual patterns in Issue Tracking System. ICSP 2010, LNCS 6195, pp. 222-233,2010 7. Ekkart Kindler, Vladimir Rubin, Wilhelm Schafer. Activity mining for Discovering Software Process Models. In proceeding of: Software Engineering 2006, Fachtagung des GI- Fachbereichs Softwaretechnik, 28.-31.3.2006 in Leipzig. 28

References 1. Rubin, Vladimir, et al. "Process mining framework for software processes."software Process Dynamics and Agility. Springer Berlin Heidelberg, 2007. 169-181. 2. Akman, Burcu, and O. Demirors. "Applicability of Process Discovery Algorithms for Software Organizations." Software Engineering and Advanced Applications, 2009. SEAA'09. 35th Euromicro Conference on. IEEE, 2009. 3. Patrick Knab, Martin Pinzger, and Harald C. Gall. Visual patterns in Issue Tracking System. ICSP 2010, LNCS 6195, pp. 222-233,2010 4. Wouter Poncin, Alexander Serebrenik, Mark van den Brand. Process Mining Software Repositories. CSMR 2011 5. Sunindyo, Wikan, et al. "Improving Open Source Software Process Quality Based on Defect Data Mining." Software Quality. Process Automation in Software Development. Springer Berlin Heidelberg, 2012. 84-102. 6. Ekkart Kindler, Vladimir Rubin, Wilhelm Schafer. Activity mining for Discovering Software Process Models. In proceeding of: Software Engineering 2006, Fachtagung des GI-Fachbereichs Softwaretechnik, 28.-31.3.2006 in Leipzig. 7. Günther, Christian W., and Wil MP Van Der Aalst. "Fuzzy mining adaptive process simplification based on multiperspective metrics." Business Process Management. Springer Berlin Heidelberg, 2007. 328-343. 8. Shihab, Emad, et al. "Predicting re-opened bugs: A case study on the eclipse project." Reverse Engineering (WCRE), 2010 17th Working Conference on. IEEE, 2010. 9. Thomas Zimmermann, Nachiappan Nagappan, Philip J. Guo, Brendan Murphy. Characterizing and Predicting Which Bugs Get Reopened. In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012 SEIP Track), Zurich, Switzerland, June 2012. 10. C. A. Halverson, J. B. Ellis, C. Danis, and W. A. Kellogg. Designing task visualizations to support the coordination of work in software development. In Proceedings of the 2006, 20th anniversary conference on Computer supported cooperative work (CSCW 2006), pages 39 48, New York, NY, USA, 2006. ACM Press. 29

THANK YOU! 30