Measuring Software Product Quality Eric Bouwers June 17, 2014 T +31 20 314 0950 info@sig.eu www.sig.eu
Software Improvement Group Who are we? Highly specialized advisory company for cost, quality and risks of software Independent and therefore able to give objective advice 2 I 54 What do we do? Fact-based advice supported by our automated toolset for source code analysis Analysis across technologies by use of technologyindependent methods Our mission: We give you control over your software.
Services 3 I 54 Software Risk Assessment In-depth investigation of software quality and risks Answers specific research questions Software Monitoring Continuous measurement, feedback, and decision support Guard quality from start to finish Software Product Certification Five levels of technical quality Evaluation by SIG, certification by TÜV Informationstechnik Application Portfolio Analyses Inventory of structure and quality of application landscape Identification of opportunities for portfolio optimization
Today s topic 4 I 54 Measuring Software Product Quality
Some definitions Projects Activity to create or modify software products Design, Build, Test, Deploy 5 I 54 Product Any software artifact produced to support business processes Either built from scratch, from reusable components, or by customization of standard packages Portfolio Collection of software products in various phases of lifecycle Development, acceptance, operation, selected for decommissioning Project Product Product Product Product Product Product
Our claim today Measuring the quality of software products is key to successful software projects and healthy software portfolios 6 I 54 Project Product Product Product Product Product Product
Today s topic 7 I 54 Measuring Software Product Quality
The ISO 25010 standard for software quality 8 I 54 Functional Suitability Performance Efficiency Compatibility Usability ISO 25010 Reliability Security Maintainability Portability
Why focus on maintainability? 9 I 54 Initiation Build Acceptation Maintenance 20% of the costs 80% of the costs
The sub-characteristics of maintainability in ISO 25010 10 I 54 Maintain Analyze Reuse Modify Test Modularity
Measuring maintainability Some requirements 11 I 54 Simple to understand Allow root-cause analyses Technology independent Suggestions? Easy to compute Heitlager, et. al. A Practical Model for Measuring Maintainability, QUATIC 2007
Measuring ISO 25010 maintainability using the SIG model 12 I 54 Component independence Unit complexity Module coupling Unit interfacing Component balance Duplication Volume Unit size Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X
Measuring maintainability Different levels of measurement System 13 I 54 Component Module Unit
Source code measurement Volume Lines of code Not comparable between technologies 14 I 54 Function Point Analysis (FPA) A.J. Albrecht - IBM - 1979 Counted manually Slow, costly, fairly accurate Backfiring Capers Jones - 1995 Convert LOC to FPs Based on statistics per technology Fast, but limited accuracy
Source code measurement Duplication 15 I 54 0: abc 1: def 2: ghi 3: jkl 4: mno 5: pqr 6: stu 7: vwx 8: yz 34: xxxxx 35: def 36: ghi 37: jkl 38: mno 39: pqr 40: stu 41: vwx 42: xxxxxx
Source code measurement Component balance 16 I 54 0.0 A 0.1 B 0.2 C 0.6 D Measure for number and relative size of architectural elements CB = SBO CSU SBO = system breakdown optimality, computed as distance from ideal CSU = component size uniformity, computed with Gini-coefficient E. Bouwers, J.P. Correia, and A. van Deursen, and J. Visser, A Metric for Assessing Component Balance of Software Architectures in the proceedings of the 9 th Working IEEE/IFIP Conference on Software Architecture (WICSA 2011)
From measurement to rating A benchmark based approach 17 I 54 0.14 0.96 0.96. sort 0.84. Threshold Score 0.9 HHHHH 0.8 HHHHI 0.5 HHHII 0.09 0.14 0.3 HHIII 0.1 HIIII Note: example thresholds 0.84 0.09 0.34 HHIII
But what about the measurements on lower levels? 18 I 54 Component independence Unit complexity Module coupling Unit interfacing Component balance Duplication Volume Unit size Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X
Source code measurement Logical complexity T. McCabe, IEEE Transactions on Software Engineering, 1976 19 I 54 Academic: number of independent paths per method Intuitive: number of decisions made in a method Reality: the number of if statements (and while, for,...) Method McCabe: 4
My question 20 I 54 How can we aggregate this?
Option 1: Summing 21 I 54 Crawljax GOAL Checkstyle Springframework Total McCabe 1814 6560 4611 22937 Total LOC 6972 25312 15994 79474 Ratio 0,260 0,259 0,288 0,288
Option 2: Average 22 I 54 Crawljax GOAL Checkstyle Springframework Average McCabe 1,87 2,45 2,46 1,99 1800" 1600" 1400" 1200" 1000" 800" 600" 400" 200" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 17" 18" 19" 20" 21" 22" 23" 24" 25" 27" 29" 32"
Option 3: quality profile Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate 11-25 High > 25 Very high Sum lines of code" per category" Springframework# 23 I 54 Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 % Checkstyle# Goal# Crawljax# 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#
First Level Calibration 24 I 54
The formal six step proces 25 I 54 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010
Visualizing the calculated metrics 26 I 54 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010
Choosing a weight metric 27 I 54 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010
Calculate for a benchmark of systems 28 I 54 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010 70% 80% 90%
results are again similar. Except for the unit interfacing metric, SIG Maintainability Model Derivation metric thresholds 29 I 54 per risk category n LOC) 1. Measure systems in benchmark 2. Summarize all measurements (a) Metric distribution (b) Box plot per risk category Fig. 12: Module Inward Coupling (file fan-in) 3. Derive thresholds that bring out the metric s variability 4. Round the thresholds Derive & Round" Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate 11-25 High > 26 Very high per risk category (a) Metric distribution (b) Box plot per risk category parameters) Fig. 13: Module Interface Size (number of methods per file) Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010
The quality profile Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate 11-25 High > 25 Very high Sum lines of code" per category" Springframework# 30 I 54 Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 % Checkstyle# Goal# Crawljax# 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#
Second Level Calibration 31 I 54
How to rank quality profiles? Unit Complexity profiles for 20 random systems HHHHHHHHHH 32 I 54 HHHHI HHHII? HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
Ordering by highest-category is not enough! HHHHH 33 I 54 HHHHI HHHII? HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
A better ranking algorithm 34 I 54 Order categories Define thresholds of given systems Find smallest possible thresholds Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
Which results in a more natural ordering HHHHH 35 I 54 HHHHI HHHII HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
Second level thresholds Unit size example 36 I 54 Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
SIG Maintainability Model Mapping quality profiles to ratings 37 I 54 1. Calculate quality profiles for the systems in the benchmark 2. Sort quality profiles 3. Select thresholds based on 5% / 30% / 30% / 30% / 5% distribution Sort" Select thresholds" HHHHH HHHHI HHHII HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011
SIG measurement model Putting it all together 38 I 54 Property measurements per technology Property ratings per technology Measurements per system Rating per system Subcharacteristics Maintainability rating 21 Man years Volume HHHII JSP JSP Duplication HHIII Java Java Duplication HHHII JSP JSP Unit size HIIII Java Java Unit size HHIII JSP JSP Unit complexity HHHHI Java Java Unit complexity HHIII JSP N/A JSP Unit interfacing N/A Java Java Unit interfacing HHHHI JSP N/A JSP Module coupling N/A Java Java Module coupling HHHII Duplication HHIII Unit size HHIII Unit complexity HHHII Unit interfacing HHHHI Module coupling HHHII Analysability HHIII Modifiability HHHII Testability HHHHI Modularity HHHHI Reusability HHHII Maintainability HHHII Component balance HHHHH JSP Java N/A JSP Component independence N/A Java Component independence HHHHI Component independence HHHHI
Your question 39 I 54 Does this work?
SIG Maintainability Model Empirical validation 40 I 54 Internal: maintainability external: Issue handling 16 projects (2.5 MLOC) 150 versions The Influence of Software Maintainability on Issue Handling MSc thesis, Technical University Delft Indicators of Issue Handling Efficiency and their Relation to Software Maintainability, MSc thesis, University of Amsterdam Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010 50K issues
Empirical validation The life-cycle of an issue 41 I 54
Empirical validation Defect resolution time 42 I 54 Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010
Empirical validation Quantification 43 I 54 Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010
SIG Quality Model Quantification Resolution time for defects and enhancements 44 I 54 Faster issue resolution with higher quality Between 2 stars and 4 stars, resolution speed increases by factors 3.5 and 4.0
SIG Quality Model Quantification Productivity (resolved issues per developer per month) 45 I 54 Higher productivity with higher quality Between 2 stars and 4 stars, productivity increases by factor 10
Your question 46 I 54 Does this work? Yes Theoretically
Your question 47 I 54 But is it useful? HHHHH
Software Risk Assessment 48 I 54 Kick-off session Strategy session Technical session Technical validation session Risk validation session Final presentation Final Report Analysis in SIG Laboratory
Example Which system to use? 49 I 54 HHHII HHHHI HHIII
Should we accept delay and cost overrun, or cancel the project? 50 I 54 User Interface User Interface Business Layer Business Layer Data Layer Data Layer Vendor framework Custom
Software Monitoring 51 I 54 Source code Source Source 1 week code 1 week code 1 week Source code Automated analysis Automated analysis Automated analysis Automated analysis Updated Website Updated Website Updated Website Updated Website Interpret Discuss Manage Act! Interpret Discuss Manage Act! Interpret Discuss Manage Act! Interpret Discuss Manage Act!
Example Very specific questions 52 I 54 HHHHH Volume HHHHH Duplication HHHHH Unit size HHHHH Unit complexity HHHHH Unit interfacing HHHHI Module coupling HHHHI Component balance HHHHI Component independence HHHII
Software Product Certification 53 I 54
Summary 54 I 54 Component indepedence Module coupling Unit interfacing Component balance Duplication Volume Unit size Complexity Springframework# Checkstyle# Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X Goal# Crawljax# 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# HHHHH Thank you! Eric Bouwers e.bouwers@sig.eu