Software Metrics: Roadmap

Software Metrics: Roadmap By Norman E. Fenton and Martin Neil Presentation by Karim Dhambri

Authors (1/2) Norman Fenton is Professor of Computing at Queen Mary (University of London) and is also Chief Executive Officer of Agena, a company that specialises in risk management for critical systems. He is head of RADAR (Risk Assessment and Decision Analysis) Group Software Metrics: Roadmap 2

Authors (2/2) Martin Neil is a Reader in "Systems Risk" at the Department of Computer Science, Queen Mary, University of London, where he teaches decision and risk analysis and software engineering. Martin is also a joint founder and Chief Technology Officer of Agena Ltd (UK) Software Metrics: Roadmap 3

Plan Introduction Brief history of software metrics Weaknesses of traditionnal approaches Causal models Future works Comments on the article Software Metrics: Roadmap 4

Introduction (1/9) The car accidents example «Data on car accidents in both the US and the UK reveal that January and February are the months when the fewest fatalities occur.» Software Metrics: Roadmap 5

Introduction (2/9) The car accidents example «Thus, if you collect a database of fatalities organised by months and use this to build a regression model, your model would predict that it is safest to drive when weather is coldest and roads are at their most treacherous.» Software Metrics: Roadmap 6

Introduction (3/9) The car accidents example Such a conclusion is perfectly sensible given the data available, but intuitively we know it s wrong. The problem is that you do not have all the relevant data to make a sensible decision about the safest time to drive. Software Metrics: Roadmap 7

Introduction (4/9) The car accidents example Software Metrics: Roadmap 8

Introduction (5/9) So what has this got to do with software metrics? Well, software metrics has been dominated by statistical models, such as regression models, when what is really needed are causal models. Software Metrics: Roadmap 9

Introduction (6/9) Software resource estimation Much software metrics has been driven by the need for resource prediction models. Usually this work has involved models of the form effort=f(size) Software Metrics: Roadmap 10

Introduction (7/9) Problems with effort=f(size) Size cannot cause effort. Such models cannot be used for risk assessment because they lack explanatory framework. Managers can t decide how to improve things from the model s outputs. Software Metrics: Roadmap 11

Introduction (8/9) Solution: causal modeling Provide an explanatory structure to explain events that can then be quantified. Provide information to support quantitative managerial decision-making during the software lifecycle. Provide support for risk assessment and reduction. Software Metrics: Roadmap 12

Introduction (9/9) Software resource estimation Software Metrics: Roadmap 13

History of metrics (1/13) Def.: Software metrics is a collective term used to describe the very wide range of activities concerned with measurement in software engineering. Software Metrics: Roadmap 14

History of metrics (2/13) These activities range from: Producing numbers that characterize properties of software code Models that help predict software resource requirements and software quality Quality control and assurance Software Metrics: Roadmap 15

History of metrics (3/13) Software metrics are used since the mid-1960 s At that time, Lines of Code was used as a measurement of productivity and effort Software Metrics: Roadmap 16

History of metrics (4/13) Problems using metrics: Theory and practice have been out of step Metrics often misunderstood, misused, and even reviled Industry is not convinced of metrics benefits Metrics programs are used when things go bad to satisfy some assessment body (CMM) Software Metrics: Roadmap 17

History of metrics (5/13) The two components of software metrics: The component concerned with defining the actual measures The component concerned with how we collect, manage and use the measures Software Metrics: Roadmap 18

History of metrics (6/13) Software Metrics: Roadmap 19

History of metrics (7/13) Rationale for using metrics The desire to assess or predict effort/cost of development processes The desire to asses or predict quality of software products Software Metrics: Roadmap 20

History of metrics (8/13) The key in both cases has been the assumption that product size should drive any predictive models. Software Metrics: Roadmap 21

History of metrics (9/13) LOC/programmer month as productivity measure Regression-based resource prediction by Putnam and Boehm: Effort = f(loc) Program quality measurement (usually defects/kloc) Software Metrics: Roadmap 22

History of metrics (10/13) In the mid-1970 s, we recognized the drawbacks of using LOC as a measure for different notions of program size. LOC cannot be compared between high- and low-level programming languages Software Metrics: Roadmap 23

History of metrics (11/13) From the mid-1970 s interest in measures of software complexity and functional size (such as function points) The rational for these metrics is still to asses quality and effort/cost Software Metrics: Roadmap 24

History of metrics (12/13) Study of software metrics has been dominated by defining specific measures and models. Much recent work has been concerned with collecting, managing, and using metrics in practice. Software Metrics: Roadmap 25

History of metrics (13/13) Most notable advances Work on the mechanics of implementing metrics programs Grady and Caswell: first company-wide software metrics program Basili, Rombach: GQM The use of metrics in empirical software engineering Benchmarking and evaluating the effectiveness of s.e. methods, tools and technologies (Basili) Software Metrics: Roadmap 26

Weaknesses of traditional approaches (1/11) The approaches to both quality prediction and resource prediction have remained fundamentally unchanged since the early 1980 s. Software Metrics: Roadmap 27

Weaknesses of traditional approaches (2/11) These approaches have provided some extremely valuable empirical results, but cannot be used effectively for quantitative management and risk analysis, the primary objective of metrics. Software Metrics: Roadmap 28

Weaknesses of traditional approaches (3/11) Regression-based model for quality prediction: f(complexity metric) = defect density Problems Incapable of predicting defects accurately No explanations of how defect introduction and detection variable affects defect counts Software Metrics: Roadmap 29

Weaknesses of traditional approaches (4/11) A further empirical study (Fenton) shown: Size metrics (while correlated to gross number of defects) are poor indicator of defects Static complexity metrics are not significantly better as predictors Counts of defects pre-release is a very bad indicator of quality The lunch story Software Metrics: Roadmap 30

Weaknesses of traditional approaches (5/11) Software Metrics: Roadmap 31

Weaknesses of traditional approaches (6/11) These results invalidate models: using pre-release faults as a measure for operational quality using complexity metrics to predict modules fault-prone post release Complexity metrics were judged valid if correlated with pre-release fault density Software Metrics: Roadmap 32

Weaknesses of traditional approaches (7/11) Empirical phenomenon observed by Adam (1984): [ ] most operational system failures are caused by a small proportion of the latent faults. The fact that fault density (in terms of pre-release faults) was used as a measure of user perceived software quality lead us to wrong conclusions. Software Metrics: Roadmap 33

Weaknesses of traditional approaches (8/11) Explanations of the scatter plot Most of the modules that had high number of pre-release, low number of post-release faults just happened to be very well tested. A module that is never executed will never reveal latent faults (no matter how many), hence operational usage must be taken into account. Software Metrics: Roadmap 34

Weaknesses of traditional approaches (9/11) Other problems with regression-based models for resource prediction: Lack causal factors to explain variation Based on limited historical data Resource constraints not modeled Black box models Cannot handle uncertainty Little support for risk assessment and reduction Software Metrics: Roadmap 35

Weaknesses of traditional approaches (10/11) The classic problem : Is this system sufficiently reliable to ship? Useful information: Measurement data from testing (such as defects found in various testing phases) Empirical data about the process and resources used Subjective information about the process/resources Very specific and important pieces of evidence (proof of correctness) Software Metrics: Roadmap 36

Weaknesses of traditional approaches (11/11) In practice, we only possess fragments of such information. The question is how to combine such diverse information and then how to use it to help solve a decision problem that involves risk. Software Metrics: Roadmap 37

Causal models (1/7) We need a model that take account of missing concepts from regressionbased approaches: Diverse process and product variables Empirical evidence and expert judgement Genuine cause and effect relationship Uncertainty Incomplete information Software Metrics: Roadmap 38

Causal models (2/7) Def.: A BBN is a graphical network together with an associated set of probability tables. The nodes represent uncertain variables and the arcs represent the causal/relevance relationship between the variables. Software Metrics: Roadmap 39

Causal models (3/7) Software Metrics: Roadmap 40

Causal models (4/7) Building and executing realistic BBN models is now possible because of recent algorithms and software tools. Practical applications: Medical diagnosis Mechanical failure diagnosis Help wizards in Microsoft Office Software Metrics: Roadmap 41

Causal models (7/7) Benefits of using BBNs: Explicit modeling of ignorance and uncertainty Combine diverse types of information Makes assumption explicit Intuitive graphical format Ability to forecast with missing data Use of what-if? Use of subjectively or objectively derived probability distributions Rigorous math semantic Availability of tools like Hugin Software Metrics: Roadmap 44

Future works Combining causal models such as BBNs with preference models such as those found in MCDA. Extending the emerging discipline of empirical software engineering (cause and effects hypotheses). Developing metric programs for decision-support involving companyspecific data input. Technology Software transfer Metrics: Roadmap (questionnaires) 45

Comments on the article Positive Application of simulation to software engineering Causal models can constantly be tuned Negative Would have liked more details concerning BBNs In practice, how can we determine the probability for each node Software Metrics: Roadmap 46