SOFTWARE PROCESS MINING DR. VLADIMIR RUBIN LEAD IT ARCHITECT & CONSULTANT @ DR. RUBIN IT CONSULTING LEAD RESEARCH FELLOW @ PAIS LAB / HSE ANNOTATION Nowadays, in the era of social, mobile and cloud computing, different business information systems produce, log and trace regularly terra bytes of data. Process mining deals with transforming this data to a valuable knowledge, which is used for improving the business processes. However, process mining can also be successfully applied to the area of development of information systems. It can be used for deriving the model of a software development process. Mining the end-user behavior can help improving the functionality and the usability of software. And mining the software system at runtime is beneficial for improving the software architecture and performance. Here, we introduce software process mining : 1. mining the software development process 2. mining the software end-user behavior 3. mining the software runtime behavior 29.01.2014 Slide 2
VLADIMIR RUBIN Lead IT Architekt and Consultant Collaboration with msg systems AG Founder of Dr. Rubin IT Consulting, Frankfurt/Germany Lead Research Fellow at PAIS Lab (Higher School of Economics, Moscow) 3 Years msg systems ag, Frankfurt, Munich/Germany 3 Years Capgemini, Frankfurt/Germany, Bern/Switzerland 3 Years Netcracker Technologies Corp, Boston/USA 3 YearsPhDin Computer Science University of Paderborn/Germany, Eindhoven University of Technology/Holland 5 Years M.Sc. in Computer Science at Moscow State University of Railway Transport Points of interest: Big Enterprise Projects (Java EE) and Methodical SW-Development (Agile, SOA) Business Process Modeling (BPM) and Process Mining Model-driven Software Development (MDD) 29.01.2014 Slide 3 MODERN SOFTWARE PROJECTS How the customer explained it How the analyst designed it How the programmer wrote it How the project was documented How the customer was billed What the customer really needed * http://www.projectcartoon.com 29.01.2014 Slide 4
HOW PROCESS MINING HELPS DEALING WITH SOFTWARE ENGINEERING CHALLENGES? 29.01.2014 Slide 5 ONCE PROCESS MINER, ALWAYS PROCESS MINER 29.01.2014 Slide 6
AGENDA 1. Software Process Mining 2. Software Process Mining 29.01.2014 Slide 7 MOTIVATION: QUALITY Software Process Quality CMM (CMMI) Product Quality Idea Automatic support for deriving software development processes Company Process Model Process Engineer ~50% of companies Practitioners are not involved Existing processes are not analysed Manual way of work: expensive, error-prone... Models have discrepancies with the reality 29.01.2014 Slide 8
MOTIVATION: SOFTWARE DEVELOPMENT PROCESS = 29.01.2014 Slide 9 HYPOTHESIS Document Logs from Software Repositories can be used for discovering Process Models Mining Approach 29.01.2014 Slide 10
MINING APPROACH: PREPROCESSING 1. Preprocessing Example: SCM Commits (CVS, Subversion, ClearCase,...) DES 1 designer CODE 2 developer TEST 3 qaengineer REV 4 manager DES 1 designer TEST 2 qaengineer CODE 3 developer REV 4 designer Revision 569362 - (view) (download) (as text) (annotate) - [select for diffs] Modified Fri Aug 2412:09:092007 UTC (6 weeks, 1 day ago) by bayard Revision 567258 - (view) SVN (download) log(as text) (annotate) - [select for diffs] Modified Sat Aug 1811:14:522007 UTC (7 weeks, 1 day ago) by tetsuya Different Projects (Plugins) Different Releases DES 1 designer VER 2 qaengineer CODE 3 designer REV 4 manager Other Examples: Bug Tracking (Bugzilla,...) Issue Tracking (Jira,...)... MINING APPROACH: CONTROL-FLOW MINING ALGORITHM 1. Preprocessing 2. Process Mining DES 1 designer CODE 2 developer TEST 3 qaengineer REV 4 manager a) Transition System Generation b) Petri Net Synthesis Constructing TS Modification Strategies for TS DES 1 designer TEST 2 qaengineer CODE 3 developer REV 4 designer DES 1 designer VER 2 qaengineer CODE 3 designer REV 4 manager Properties: flexible, supports generalization deals with complex constructs generates consistent models apply theory of regions : synthesis algorithms of Cortadella et al.
MINING APPROACH: OTHER PERSPECTIVES 1. Preprocessing 2. Process Mining 3. Model Analysis Performance Perspective 0.67 DES TEST 0.33 VER REV 0.25 CODE 0.75 Organizational Perspective 0.111 0.111 0.111 designer 0.111 developer 0.111 qaengineer Conformance Checking and Views DES 1 designer CODE 2 developer TEST 3 qaengineer REV 4 manager Verification (LTL) Always when CODE then eventually TEST 0.222 0.111 0.111 manager apply different algorithms developed in the IS Group (TU/e) IMPLEMENTATION 1. Preprocessing 2. Process Mining 3. Model Analysis ProM Import Framework ProM Implemented ProM Plugins: Transition System Generator Export2Petrify (Petrify PN Synthesis) Import from Petrify + Remap Filter (together with C. Günther) (based on Prolog research prototype) In cooperation with the IS group (Eindhoven University of Technology) 29.01.2014 Slide 14
EVALUATION Case Studies: Softwaretechnikpraktikum (SCM system CVS and Subversion) FG Softwaretechnik, University of Paderborn Main Results: Discovered plausible process models corresponding to the given specifications Indentified the discrepancies between the specified and the discovered processes Open-source Software Project ArgoUML (SCM system Subversion) Analysed the performance and identified the critical tasks Open Development Platform Eclipse (Bug Repositories Bugzilla) Discovered organizational models and the social networks Verified the models against important properties 29.01.2014 Slide 15 CONTRIBUTIONS A Worklfow Mining Approach for Deriving Software Process Models mining different perspectives incremental mode Software Process Mining (Research Areas) Theory of Regions configurable consistent Tool Support Evaluation Sources of Experimental Data 29.01.2014 Slide 16
AGENDA 1. Software Process Mining 2. Software Process Mining 29.01.2014 Slide 17 MINING THE USER BEHAVIOUR 29.01.2014 Slide 18
MINING THE USER BEHAVIOUR: USE CASES Mining user actity traces can be used for: Understanding the real behaviour of the user Improving the GUI Implementing Quick Wins Redesigning the software system Changing the design according to the real world scenarios Developing the acceptance tests Capture and replay Monitoring the system usage (APM application performance monitoring) Visualizing the state, Failure Alerts 29.01.2014 Slide 19 EXAMPLE: TOURISTIC BOOKING SYSTEM 29.01.2014 Slide 20
TOMA MASK AND MESSAGE 087624 60T1009006001001T001001002000D D BA 1024DER PHXU25307V5023EUR503801SP HAM BGO 3A ST ZHI 2 1 0107135 1-2501802KV599959995431 29.01.2014 Slide 21 MINING: INPUT ~ 30 MB Logs per Day per Environment (PROD, TEST, INT, DEV) Logs are preprocessed and converted to CSV (30 KB per Day)Input for Disco 29.01.2014 Slide 22
MODEL FOR ONE SET OF TESTS FOR ONE DAY 95 cases 482 events 50 activities Mean duration: 6.5 min; Median duration: 26.5 s 29.01.2014 Slide 23 FOCUS ON SUCCESSES: FREQUENCY 64 cases (67% of all cases) 228 events 39 activities Frequent activities: Hotel Quote Hotel Book Flight Search Show Reservation 29.01.2014 Slide 24
FOCUS ON SUCCESSES: PERFORMANCE Median duration 29.01.2014 Slide 25 FOCUS ON FAILURES Problems with: Hotel Search Hotel Quote Show Reservation 29.01.2014 Slide 26
SOME STATISTICS Most frequent travelling directions: Most active users: 29.01.2014 Slide 27 WHAT WE HAVE LEARNED 1. We could monitor the acceptance tests of the users (online) 2. We could visualize the user behaviour and discuss it with the end user. Communication!!! 3. The management could easily see current successes and failures. 4. We aligned the failure cases with the exceptions and created the issues for further bug fixing. 5. We idetified the most critical parts of the software and focused firstly on them (Pareto principle) 29.01.2014 Slide 28
MINING THE SOFTWARE RUNTIME BEHAVIOUR 29.01.2014 Slide 29 MINING THE SOFTWARE RUNTIME BEHAVIOUR: USE CASES Mining software runtime traces: Understanding the performance Localizing the bottlenecks Understanding the architectural deficiencies Improving the architecture Aligning the exception traces with user behaviour 29.01.2014 Slide 30
EXAMPLE: TOURISTIC BOOKING SYSTEM 29.01.2014 Slide 31 MINING: INPUT ~ 5 GB of Traces per Day per Environment (PROD, TEST, INT, DEV) Logs are preprocessed and converted to CSV (20 MB per Day)Input for Disco 29.01.2014 Slide 32
MODEL FOR ONE DAY FOR ONE BUSINESS DOMAIN 758 cases Computation of the whole graph takes 61844 events more then 30 minutes 508 activities Mean duration: 5 sec; Median duration: 30 millis 29.01.2014 Slide 33 FOCUS ON ONE BUSINESS DOMAIN: FREQUENCY 1. Process calls 2.Subsequent service calls 29.01.2014 Slide 34
FOCUS ON ONE BUSINESS DOMAIN: PERFORMANCE Total duration of calls 29.01.2014 Slide 35 SOME STATISTICS Payloads: Frequency of activities: 29.01.2014 Slide 36
WHAT WE HAVE LEARNED 1. We could visualize the system runtime behaviour. 2. We could discuss (drill down, roll up) particular behaviour with technical designers 3. We identified the performance bottlenecks. 4. We identified the most critical processes and services from the architectural point of view 5. We improved the performance in many cases using caching or refactoring... 29.01.2014 Slide 37 OVERVIEW 29.01.2014 Slide 38
FUTURE WORK: RESEARCH DIRECTIONS... 1. Process Mining methods for Software Process Mining Filtering (OLAP-similar operations) Dealing with Gigabytes of logs 2. Mining different perspectives Data perspective (Requests, Responses, Payloads) 3. Integrating Process Mining in Software Development Process Agile approaches (Early Feedback using Process Mining) 4. Monitoring and Process Mining Online Aligning mined models with logs Continuous repair of models 5. Prediction of user behaviour, guiding the user 29.01.2014 Slide 39 LET S PROCEED WITH SOFTWARE PROCESS MINING!!!