Chap. Software Quality Management.3 Software Measurement and Metrics. Software Metrics Overview 2. Inspection Metrics 3. Product Quality Metrics 4. In-Process Quality Metrics
. Software Metrics Overview Software Measurement and Metrics Measurement gives a snapshot of specific parameters represented by numbers, weights or binary statements. Taking measurements over time and comparing them with specific baseline generate metrics. -Software measurement is concerned with deriving a numeric value for some attribute of a software product or a software process. Comparing these values to each other and to relevant standards allows drawing conclusions about the quality of software product or the effectiveness of software processes. Examples of software metrics: program size, defect count for a given program, head count required to deliver a system component, development effort and time, product cost, etc.
Classes of Software Metrics -There are 3 classes of entities of interest in software measurement: processes, products, and resources. processes: Are any software related activities which take place over time. products: are any artifacts, deliverables or documents which arise out of the processes. resources: are the items which are inputs to processes -These result in 3 broad classes of software metrics: project (i.e., resource), product and process metrics. Process Metrics Measure the efficacy of processes Focus on quality achieved as a consequence of a repeatable or managed process Reuse (historical/past projects) data for prediction Examples: defect removal efficiency productivity
Project Metrics Main goals of project metrics: Assess the status of projects Track risk Identify problem areas Adjust work flow Examples of project measurement targets include: Effort/time per SE task Defects detected per review hour Scheduled vs. actual milestone dates Changes (number) and their characteristics Distribution of effort on SE tasks Product Metrics Measure predefined product attributes Focus on the quality of deliverables Examples of product measures include: Code or design complexity, Program size, Defect count, Reliability
Software Quality Metrics -Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project. In general, software quality metrics are more closely associated with process and product metrics than with project metrics. Nonetheless, project parameters such as the number of developers and their skill levels, the schedule, the size, and the organization certainly affect product quality. -Software quality metrics can be further divided into end-product quality metrics and in-process quality metrics. The ultimate of goal software quality engineering is to investigate the relationships among in-process metrics, project characteristics, and end-product quality, and based on these findings to engineer improvements in both process and product quality.
2. Inspection Metrics -The data collected during a software process are used to compute a set of metrics that support evaluation and improvement of the process as well as planning and tracking quality. -The metrics computed during such process should be defined by the requirements of your organization (typically in the quality manual). The collection of data and calculation of metrics for no reason is a waste of time. -Many different metrics can be calculated during an inspection process including the following: The number of major and minor defects found The number of major defects found to total found. (If the proportion of minor defects to major defects is too large a moderator may request that the reviewer repeat the review, focusing on major defects, prior to commencing the logging meeting) The size of the artifact (pages, LOC,...) The rate of review - the size of the reviewed artifact divided by time (normally expressed in hours) (e.g., 5 pages /hour). The defect detection rate - the number of major defects found per review hour.
Total Number of Defects Found and Defects Density -Total number of defects found is the sum of the total number of defects found by each reviewer, minus the number of common defects found. For instance, with 2 reviewers, the metric is computed by Total Defects Found = A + B - C Where A and B are the number found by reviewer A and B respectively and C is the number found by both A and B. -Defect density is the ratio of the number of defects found to the size of the artifact. It is given by Defect Density = Total Defects Found / Size Where the size of the artifact is measured in number of pages, loc, or other size measure.
Example: compute inspection metrics from the following inspection log form
Estimated Total Number of Defects -The estimated total number of defects is the sum of the total number of defects found and the estimated total number of defects remaining. In order to estimate the total number of defects remaining in an artifact immediately after inspection, we use an approach similar to the population sampling approach used by biologists to estimate the population of a particular ecosystem. Population Sampling Approach -Suppose we have a fish farm and we want to estimate the total number of fish we have. We could apply the capture-recapture method:. Capture a number of fish, tag them and release them (let this number be S ). 2. Allow time for the first sample population to redistribute. 3. Capture a second number of fish (let this number be S 2 ). 4. Count the number of tagged fish in the second population (let this number be S T ). 5. Calculate the proportion of tagged fish in the second population (let this number be T, then T= S T / S 2 ). 6. We assume that T is representative of the proportion of tagged fish in the total population (POP), so T POP= S, or for our purposes POP= S /T.
-Using the population sampling approach to estimate the number of defects remaining leads to the following steps: Let the number of defects found by one reviewer be the tagged population (A).. Assume an even likelihood of finding all defects (even distribution,...) 2. Count the number of defects found by the second reviewer (B). 3. Count the number of defects found by the second reviewer that were also found by the first (C the common defects). 4. Calculate the proportion of common defects in the second reviewers defects (T=C/B). 5. We assume that T is representative of the proportion of common defects in the total number of defects (EstTot), so T EstTot=A, or for our purposes EstTot = A/T = (A B)/C. -So assuming that defects are equally likely to be found in an artifact and each reviewer is equally likely to find every defect, then: Estimated Total Defects = (A B)/C Note that in practice such assumptions are not always fulfilled, simply because some defects are harder to find than others, and some reviewers are better than others.
Inspection Yield -Inspection yield refers to the defect removal efficiency (DRE) of an inspection process. -The defect removal efficiency of a process is given by calculating the percentage of total defects that were found by that process. So the yield of an inspection is given by: Yield = Total Defects Found / Estimated Total Defects 00%
Inspection Rate and Defect Detection Rate -Requires computing the inspection time, which is the sum of each reviewers review time plus the total person time spent in each meeting. -The inspection rate is computed by: Inspection rate = size / total inspection time Where: size stands for the size of the artifact in number of pages, loc, or other size measure. total inspection time measured in hours -The defect detection rate is computed by: Defect finding efficiency = Total defects found / total inspection time
Calculating Inspection Metrics with more than two Reviewers -If there are more than 2 reviewers, the same approach can be taken to calculate inspection totals and yield by grouping reviewers into two groups for calculation: group A and group B: If there are 3 reviewers, it is often a good idea to choose the person who has the most unique defects to be one group and the other two reviewers to be the other group. For each group if any member of that group has found a defect, then count it for the group.
Example with 3 reviewers: No. Defect Description Engineers (finding major defects) R R2 R3 A (R2) B Totals 7 9 7 Unique defects 0 2 0 Inspection Summary Product size: 0 Size measure: pages Total defects for A: Total defects for B: Common defects: Est. total defects: Total number found: Number left:
3. Product Quality Metrics -The de facto definition of software quality consists of two levels: intrinsic product quality and customer satisfaction. Intrinsic product quality is usually measured by the number of bugs in the software or by how long the software can run before encountering a crash. In operational definitions, the two metrics used for intrinsic product quality are: - Defect density (rate) - Mean Time To Failure (MTTF) -The two metrics are correlated but are different both in purpose and usage. MTTF: - Most often used with special-purpose systems such as safety-critical systems -Measure the time between failures Defect density: -Mostly used in general-purpose systems or commercial-use software. -Measure the defects relative to the software size (e.g., lines of code, function points)
Size may be estimated: -Directly using size oriented metrics: lines of code (loc) -Indirectly using function oriented metrics: function points (FP)
Size Oriented Metrics Lines of Code (LOC) Normalising quality/productivity metrics considering size Size usually LOC, KLOC, or Pages of Documentation Defects per KLOC $ per LOC Pages of documentation per KLOC LOC per person-month $ per page of documentation -Lines of code metric is anything but simple: The major difficulty comes from the ambiguity of the operational definition, the actual counting. In the early days of Assembler programming, in which one physical line was the same as one instruction, the LOC definition was clear. With high-level languages the one-to-one correspondence doesn t work.
Differences between physical lines and instruction statements (or logical lines of code) and differences among languages contribute to the huge variations in counting LOCs. Possible approaches used include: -Count only executable lines -Count executable lines plus data definitions -Count executable lines, data definitions, and comments -Count executable lines, data definitions, comments, and job control language -Count lines as physical line on an input screen -Count lines as terminated by logical delimiters. -In any case, when any data on size of program products and their quality are presented, the method for LOC counting should be described: whether it is based on physical or logical LOC. When straight LOC count is used, size and defect rate comparisons across (different) languages are often invalid. Example: the approach used at IBM consists of counting source instructions including executable lines and data definitions but excluding comments and program prologue.
Defect Density Metrics -At IBM, the following LOC count metrics, based on logical LOC, are used: Shipped Source Instructions (SSI): LOC count for the total product Changed Source Instructions (CSI): LOC count for the new and changed code of the new release. -The relationship between SSI and CSI is given by: SSI (current release) = SSI (previous release) + CSI (new and changed code instructions for current release) - deleted code (usually very small) - changed code (to avoid double count in both SSI & CSI) -Defects can be field defects (found by customers), or internal defects (found internally).
-Postrelease defect rate metrics can be computed by thousand SSI (KSSI) or per thousand CSI (KCSI):. Total defects per KSSI: a measure of code quality of the total product. 2. Field defects per KSSI: a measure of defect rate in the field 3. Release-origin defects (field and internal) per KCSI: a measure of development quality 4. Release-origin field defects per KCSI: a measure of development quality per defects found by customers. Metrics () and (3) are the same for the initial release where the entire product is new; thereafter, metric () is affected by aging and the improvement (or deterioration) of metric (3). Metrics () and (3) are process measures; their field counterparts, metrics (2) and (4) represent the customer s perspective. Given an estimated defect rate (KCSI or KSSI), software developers can minimize the impact to customers by finding and fixing the defects before customers encounter them. From the customer s point of view, the defect rate is not as relevant as the total number of defects that might affect their business. Therefore, a good defect rate target should lead to a release-to-release reduction in the total number of defects, regardless of size.
Example: We are about to launch the third release of an inventory management software system. - The size of the initial release was 50 KLOC, with a defect density of 2.0 def/kssi - In the second release, 20KLOC of code was newly added or modified (20% of which are changed lines of code, with no deleted code ). - In the third release, 30KLOC of code was newly added or modified (20% of which are changed lines of code, with no deleted code ). - Assume that the LOC counts are based on source instructions (SSI/CSI). a. Considering that the quality goal of the company has been to achieve 0% improvement in overall defect rates from release-to-release, calculate the total number of additional defects for the third release. b. What should be the maximum (overall) defect rate target for the third release to ensure that the number of new defects does not exceed that of the second release.
Function Oriented Metrics -The defect rate metric, ideally, is indexed to the number of functions a software provides. If defects per unit of functions is low, then the software should have better quality even though the defects per KLOC value could be higher when the functions were implemented by fewer lines of code. Function Points proposed by Albrecht and colleagues at IBM in the mid-970s, provide an alternative way for estimating program size, which is based on the amount of functionality involved. It addresses some of the issues that result from using LOC count in size and productivity measures, especially, the differences in LOC counts that result because different levels of languages are used.
Function Points Normalising quality/productivity metrics considering functionality Indirect measurement. Can use the function point, based on other direct attributes Defects per function point $ per FP FP per person-month Measure of the amount of functionality in a product Can be estimated early in project before a lot is known. Measure a software project by quantifying processing functionality associated with major external data or control input, output or file types It is defined as the weighted total of the counts of five major components that comprise an application: External inputs, external outputs, internal files, external interface files, and external inquiries.
Computing Function Points -Step Establish count for five major information domain values Number of External Inputs (e.g., transaction types) Number of External Outputs (e.g., report types) Number of External Inquiries (types of online inquiries supported) Number of Logical Internal Files (files as the user might conceive them, not physical files) Number of External Interfaces Files (files accessed by the application but not maintained by it). -Step2 Associate complexity value with each count Simple, Average, Complex Determined based on organisational experience based criteria. Compute count total FC (Function Count), by weighting the number of each of the five components by corresponding complexity level (low, average, high).
Complexity Assessment The complexity classification of each component is based on a set of standards that define complexity in terms of objective guidelines. The standard are maintained by the International Function Point User s Group (IPFUG), which was established in 986. The following weighting factors are used: Low Average High External Input 3 4 6 External Output 4 5 7 Internal File 7 0 5 Interface file 5 7 0 External Inquiry 3 4 6 Example: For the external output component: -If the number of data element types is 20 or more and the number of file types referenced is 2 or more, then complexity is high. -If the number of data element types is 5 or fewer and the number of file types referenced is 2 or 3, then complexity is low.
Computing Function Points (Ctd.) -Step 3 Calculate complexity adjustment values (F i, where i [..4]): Each F i is computed by answering to a specific question corresponding to a basic characteristic of the system Answer Each Question using a scale of 0 (N/A) to 5 (Absolutely Essential) Sum the 4 complexity adjustment values SF i -Step 4 Calculate 4 FP = FC 0.65 + 0.0 Fi i = 0.65 and 0.0 are empirically derived constants
Computing Function Points (Ctd.): Complexity Adjustment Questions F : Does the system require reliable backup and recovery? F 2 : Are data communications required? F 3 : Are there distributed processing functions? F 4 : Is performance critical? F 5 : Will the system run in an existing, heavily utilised operating environment? F 6 : Does the system require on-line data entry? F 7 : Does the on-line data entry require the input transaction to be built over multiple screens or operations? F 8 : Are the master files updated on-line? F 9 : Are the inputs, outputs, files or inquiries complex? F 0 : Is the internal processing complex? F : Is the code designed to be reusable? F 2 : Are conversion and installation included in the design? F 3 : Is the system designed for multiple installations in different organisations? F 4 : Is the application designed to facilitate change and ease of use by the user?
Reconciling FP and LOC Most quality estimation models require LOC Example: LOC/FP for C is estimated at 28 LOC LOC/FP for C++ is estimated at 64 LOC LOC/FP for Assembly is estimated at 320 LOC
Example: Function Point Calculation for A Stock Control System CAPA Ltd. is a company that sells 200 different electrical goods on the phone. To do this they want you to create a computerised stock control system. This system should have the following functionality:. Allow the operator to enter an existing customers number or for new customers their details (up to 00 customers) 2. Check the credit rating of customers and reject those with poor ratings 3. Allow the operator to enter the goods being ordered. 4. Check the availability of the goods being ordered. o Where there are sufficient goods in stock supply all the goods o Where there are not sufficient goods supply number available and create back order to be supplied when goods become available. 5. Update the stock levels and customer account details. 6. Produce Dispatch note and invoice. 7. Update stock levels based on and delivery of goods. 8. Update customer account details based on payment by a customer.
Customer Problems Metric -Measure the problems customers encounter when using the product. For the defect rate metric, the numerator is the number of valid defects. However, from the customer s standpoint, all problems they encounter while using the product, not just the valid defects, are problems with software. These include: -Usability problems -Unclear documentation or information -Users errors etc. -The problems metric is usually expressed in terms of problems per user month (PUM): PUM = Total problems that customers reported (true defects and non-defect-oriented problems) for a time period / Total number of license-months of the software during the period Where: Number of license-months = Number of install licenses of the software Number of months in the calculation period
Customer Satisfaction Metrics -Customer satisfaction is often measured by customer survey data via the five-point scale: Very satisfied Satisfied Neutral Dissatisfied Very dissatisfied -The parameters used, for instance, by IBM to monitor customer s satisfaction include the CUPRIMDSO categories: capability, functionality, usability, performance, reliability, installability, maintainability, documentation/information, service, and overall. -Based on the five-point scale, various metrics may be computed. For example: () Percent of completely satisfied customers (2) Percent of satisfied customers : satisfied and completely satisfied (3) Percent of dissatisfied customers: dissatisfied and completely dissatisfied (4) Percent of nonsatisfied customers: neutral, dissatisfied, and completely dissatisfied
5. In-process Quality Metrics Defect Arrival Pattern During Machine Testing -Defect rate during formal machine testing (after code is integrated into the system library) is actually positively correlated with the defect rate in the field. Higher defect rates found during testing is often an indicator that the software has experienced higher error injection during its development process. The reason is simply that software defects density never follows the uniform distribution: if a piece of code has higher testing defects, it is a result of more effective testing or it is because of latent defects in the code. -Hence, the simple metric of defects per KLOC or function point is a good indicator of quality while the software is still being tested. -Overall, defect density during testing is a summary indicator; more information are actually given by the pattern of defect arrivals (which is based on the times between failures). Even with the same overall defect rate during testing, different patterns of defects arrivals indicate different quality levels in the field.
Examples: Defect arrival patterns for two different projects Test defect arrival rate (test scenario ) Test defect arrival rate (test scenario 2) Test defect cumulative rate week Test defect cumulative rate week week week
The objective is always to look for defect arrivals that stabilize at a very low level, or times between failures that are far apart, before stopping the testing effort and releasing the software to the field. -The time unit for observing the arrival pattern is usually weeks and occasionally months. -For models that require execution time data, the time intervals is in units of CPU time.
Defect Removal Effectiveness (DRE) -Measure the overall defect removal ability of the development process. It can be calculated for the entire development process, for the front end (before code integration) or for each phase: -When used for the front end, it is referred to as early defect removal -When used for specific phase, it is referred to as phase effectiveness The higher the value of the metric, the more effective the development process and the fewer defects escape to the next phase or to the field. -Defined as follows: DRE = Defects Removed during a DevelopmentPhase 00% Defects latent in the product -The number of defects latent in the product at any given phase is usually obtained through estimation as: Defect latent = Defects removed during the phase + defects found later
Example: Phase Effectiveness of a Software Project Phase Effectiveness (%) Weakest phases are UT, CI, and CT: based on Phase effectiveness metrics, actions plans to improve the effectiveness of theses phases would be established and deployed. High level Design review (DR) Low level Design review (LR) Code Inspection (CI) Unit Test (UT) Development Phase Component Test (CT) System Test (ST)