Measuring the effectiveness of testing using DDP

Similar documents

Pre-Algebra Lecture 6

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

2.2 Derivative as a Function

Relative and Absolute Change Percentages

Unit 7 The Number System: Multiplying and Dividing Integers

Aligning Correct and Realistic Performance Testing with the Agile Development Process

Selling Agile to the CFO: A Guide for Development Teams

Multiplying Fractions

SP500 September 2011 Outlook

Fractions. If the top and bottom numbers of a fraction are the same then you have a whole one.

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

NF5-12 Flexibility with Equivalent Fractions and Pages

Scan Physical Inventory

NPV Versus IRR. W.L. Silber We know that if the cost of capital is 18 percent we reject the project because the NPV

Equations, Lenses and Fractions

Social network downtime in 2008 February 2009

Session 7 Fractions and Decimals

Practical Metrics for Managing and Improving Software Testing

Agile QA Process. Anand Bagmar Version 1.

A Simple Guide to Churn Analysis

Statistical Process Control (SPC) Training Guide

Decimals and other fractions

3 Steps to an Effective Retrospective December 2012

Integrating gsix Sigma THINKING into Scrum-Based. Darian Rashid Agile Trainer and Coach

Linear Programming Notes VII Sensitivity Analysis

Quality Meets the CEO

Mutual Fund Expense Information on Quarterly Shareholder Statements

case study Coverity Maintains Software Integrity of Sun Microsystems Award-Winning Storage Products

Adopting Agile Testing

Elaboration of Scrum Burndown Charts.

Process Streamlining. Whitepapers. Written by A Hall Operations Director. Origins

The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for

Applied Software Project Management

Years after US Student to Teacher Ratio

A positive exponent means repeated multiplication. A negative exponent means the opposite of repeated multiplication, which is repeated

This guide has been written to support reviewers in writing SMART objectives within the SRDS framework. These guidelines cover the following.

Partial Fractions. (x 1)(x 2 + 1)

Testing, What is it Good For? Absolutely Everything!

A PARENT S GUIDE TO CPS and the COURTS. How it works and how you can put things back on track

Price Theory Lecture 4: Production & Cost

3. Mathematical Induction

Design a Line Maze Solving Robot

Managing Agile Projects in TestTrack GUIDE

MOBILE SUCCESS STRATEGIES

EXIN Agile Scrum Foundation. Sample Exam

7 Gaussian Elimination and LU Factorization

OA3-10 Patterns in Addition Tables

The aerodynamic center

Interview with David Bouthiette [at AMHI 3 times] September 4, Interviewer: Karen Evans

Agile processes. Extreme Programming, an agile software development process

Introduction to Fractions

Elementary circuits. Resources and methods for learning about these subjects (list a few here, in preparation for your research):

When To Work Scheduling Program

0.8 Rational Expressions and Equations

ph. Weak acids. A. Introduction

Managing Your Class. Managing Users

Club Accounts Question 6.

Testing in Agile methodologies easier or more difficult?

Price cutting. What you need to know to keep your business healthy. Distributor Economics Series

OA4-13 Rounding on a Number Line Pages 80 81

Agile version control with multiple teams

Test Plan Template (IEEE Format)

Updates to Graphing with Excel

Chapter Two. THE TIME VALUE OF MONEY Conventions & Definitions

The Top FIVE Metrics. For Revenue Generation Marketers

FORECASTING. Operations Management

Brillig Systems Making Projects Successful

APPENDIX. Interest Concepts of Future and Present Value. Concept of Interest TIME VALUE OF MONEY BASIC INTEREST CONCEPTS

Sunny Hills Math Club Decimal Numbers Lesson 4

QUALITY TOOLBOX. Understanding Processes with Hierarchical Process Mapping. Robert B. Pojasek. Why Process Mapping?

Excel Formatting: Best Practices in Financial Models

Adjust MS Project default settings to meet the needs of your project. Loading Analysis is an iterative process. Where are you overloaded?

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial

Deep Agile Blending Scrum and Extreme Programming. Jeff Sutherland Ron Jeffries

GMAT SYLLABI. Types of Assignments - 1 -

IB Maths SL Sequence and Series Practice Problems Mr. W Name

What is Organizational Communication?

The Dangers of Use Cases Employed as Test Cases

ScrumMasters Considered Harmful

Client Marketing: Sets

Understanding Options: Calls and Puts

Pristine s Day Trading Journal...with Strategy Tester and Curve Generator

Agile Methods for Analysis

CALCULATIONS & STATISTICS

Euler s Method and Functions

TeachingEnglish Lesson plans

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6

Before beginning your journey there are a number of things you will need to consider, with the most important being finance.

Understanding Credit Reports and Scores and How to Improve it!

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

How to create PDF maps, pdf layer maps and pdf maps with attributes using ArcGIS. Lynne W Fielding, GISP Town of Westwood

LESSON PLANS FOR PERCENTAGES, FRACTIONS, DECIMALS, AND ORDERING Lesson Purpose: The students will be able to:

36 TOUGH INTERVIEW QUESTIONS And ways to structure the responses

Teacher Answer Key: Measured Turns Introduction to Mobile Robotics > Measured Turns Investigation

There is a simple equation for calculating dilutions. It is also easy to present the logic of the equation.

Transcription:

Measuring the effectiveness of testing using Prepared and presented by Dorothy Graham email: 1 Contents introduction: some questions for you what is and how to calculate it case studies uses, abuses, common concerns and advice 2

Questions you may be asked How good is the testing anyway? Can you prove you are doing a good job? Your testing can still be just as good in less time, can t it? (That deadline pressure really didn t matter, did it?) How many bugs have we missed? Is the testing any better for this release? (Have we learned anything?) (Have we really improved our testing?) Are we better or worse in our testing compared compared to how to other we were groups/organizations? last time / last year? 3 : what you need to have do you keep track of defects? defects found in testing different test stages, e.g. system test, user acceptance test different releases e.g. testing for an incremental release in RAD defects found in live running reported by users / customers can you find these numbers from a previous project and your current project? do you have a reasonable number of defects found? if so, you can use to measure your test effectiveness 4

Useful measures a useful measure: supports effective analysis and decision making, and that can be obtained relatively easily. Bill Hetzel, Making Software Measurement Work, QED, 1993. easy measures may be more useful even though less accurate (e.g. car fuel economy) useful depends on objectives, i.e. what you want to know 5 Contents introduction: some questions for you what is and how to calculate it case studies uses, abuses, common concerns and advice 6

How effective are we at finding defects? defects found in testing or defects found testing defects found in testing defects found wards not found -yet start release benchmark point 7 Defect Detection Percentage () defects found by this testing total defects including those found wards "this" testing could be a test stage, e.g. component, integration, acceptance, regression, etc. testing for a function, subsystem or defect type all testing for a system testing of a sprint or increment Note: Capers Jones Defect Removal Efficiency?? not removal but detection - not efficiency but effectiveness 8

Effectiveness at finding defects defects found 50 45 40 35 30 25 20 15 10 5 0 release time 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% defects found in testing: 42 50 10 defects found testing: total defects 12 27 35 40 62 77 85 90 found: 50 = 50 90 85 50 62 77 = % 9 example testing live running live 150 50 75% 150 150 = = = 150 + 50 200 75% 10

Exercise 1 Exercise 1 The following data has been recorded for a project. Calculate the of each testing stage based on all the defect information. Fault Information Testing stage Official testing module and integration Tool testing & development Number of faults 299 = ----------- = -------- = % 40 = ----------- = -------- = % Release testing 19 = ----------- = -------- = % User Acceptance test 10 = ----------- = -------- = % Pilot 9 = ----------- = -------- = % Running ( one month) 20 Hint: you don t really need a calculator just round the numbers to the nearest 10 and you will be close enough! Defects found in this stage of testing = ------------------------------------------------------------------------ Defects found in this and all subsequent stages of testing Dorothy Graham Page 1 of 2

Exercise 1 Exercise 1 Solution: Calculation Testing stage No. faults 1) Official testing module and integration 299 75% 300 / 400 2) Tool testing & development 40 40% 40 / 100 3) Release testing 19 33% 20 / 60 4) User Acceptance test 10 25% 10 / 40 5) Pilot 9 33% (or 31%) 10 / 30 ( 9 / 29) 6) Running ( one month) 20 n/a How did we get these figures? Remember = Defects found in testing / all subsequent defects Stage 1 Official testing Test stage 1 found approximately 300 defects this is the numerator (top) The sum of all the subsequent stages is = 40 + 20 (rounded up) + 10 + 10 (rounded up) + 20 = 100 So the denominator (bottom of the equation) is 300 + 100 = 400 for Stage 1 is therefore 300/400 or 75% Stage 2 Tool testing Test stage 2 found 40 defects this is the numerator (top) The sum of all the subsequent stages is = 20 (rounded up) + 10 + 10 (rounded up) + 20 = 60 So the denominator (bottom of the equation) is 40 + 60 = 100 for Stage 2 is therefore 40/100 = 40% Stage 3 Release testing Test stage 3 found 19 defects (round up to 20) this is the numerator (top) The sum of all the subsequent stages is = 10 + 10 (rounded up) + 20 = 40 So the denominator (bottom of the equation) is 20 + 40 = 60 for Stage 3 is therefore 20/60 = 33% Stage 4 User Acceptance test Test stage 4 found 10 defects this is the numerator (top) The sum of all the subsequent stages is = 10 (rounded up) + 20 = 30 (29 to be exact) So the denominator (bottom of the equation) is 10 + 30 = 40 for Stage 4 is therefore 10/40 = 25% Stage 5 Pilot Test stage 5 found 9 defects (round up to 10) this is the numerator (top) The sum of all the subsequent stages is = 20 (the only remaining stage is live running) So the denominator (bottom of the equation) is 10 + 20 = 30 for Stage 4 is therefore 10/30 = 33% (31% if you calculate 9/29) There is no for live running, since the live running total goes into the calculation of all the previous s. Dorothy Graham Page 2 of 2

example: ST UAT ST UAT ST UAT ST UAT All test 100 50 67% = 100 = 100 = 100 + 50 150 67% 13 13 example: ST ST UAT ST UAT ST UAT All test 100 50 100 67% 40% = 100 = 100 = 100 + 50 + 100 250 40% 14 14

example: UAT ST UAT ST UAT ST UAT All test 100 50 100 67% 40% 33% = 50 = 50 = 50 + 100 150 33% 15 15 example: all test ST UAT ST UAT ST UAT All test 100 50 100 67% 40% 33% 60% = 100 + 50 = 150 = 100 + 50 + 100 250 60% 16 16

Exercise 2 Exercise 2 The following data has been recorded for a project. Calculate the in the columns on the right. The first one has been done as an example. (A calculator may be useful for some of these your mobile phone has one!) Fault Information Release System Test User Acceptance Test running (1 month) ST UAT ST LR UAT LR All test LR Release 1 100 50 100 67% 40% 33% 60% Release 2 150 50 10 Release 3 200 50 50 Release 4 50 25 125 How did we get these figures? Release 1 ST UAT: 100 / (100 + 50) = 100 / 150 = 67% ST LR: 100 / (100 + 50 + 100) = 100 / 250 = 40% UAT LR: 50 / (50 + 100) = 50 / 150 = 33% (Remember not to include the ST defects here) All test LR: (100 + 50) / (100 + 50 + 100) = 150 / 250 = 60% Dorothy Graham www. DorothyGraham.co.uk Page 1 of 2

Exercise 2 Exercise 2 Solution: Calculations Release System Test User Acceptance Test running (1 month) ST UAT ST LR UAT LR All test LR Release 1 100 50 100 67% 40% 33% 60% Release 2 150 50 10 75% 71% 83% 95% Release 3 200 50 50 80% 67% 50% 83% Release 4 50 25 125 67% 25% 17% 38% How did we get these figures? Release 2 ST UAT: 150 / (150 + 50) = 150 / 200 = 75% ST LR: 150 / (150 + 50 + 10) = 150 / 210 = 71% UAT LR: 50 / (50 + 10) = 50 / 60 = 83% All test LR: (150 + 50) / (150 + 50 + 10) = 200 / 210 = 95% Release 3 ST UAT: 200 / 200 + 50) = 200 / 250 = 80% ST LR: 200 / (200 + 50 + 50) = 200 / 300 = 67% UAT LR: 50 / (50 + 50) = 50 / 100 = 50% All test LR: (200 + 50) / (200 + 50 + 50) = 250 / 300 = 83% Release 4 ST UAT: 50 / (50 + 25) = 50 / 75 = 67% ST LR: 50 / (50 + 25 + 125) = 50 / 200 = 25% UAT LR: 25 / (25 + 125) = 25 / 150 = 17% All test LR: (50 + 25) / (50 + 25 + 125) = 75 / 200 = 38% Dorothy Graham www. DorothyGraham.co.uk Page 2 of 2

in iterative development 40 new sprint/release of Sprint 1 S2 10 35 5 of Sprint 2 S3 25 50 = 40 = 80 % 50 of Sprint 1 S3 = 40 55 = 73 % = 35 60 = 58 % 17 Contents introduction: some questions for you what is and how to calculate it case studies uses, abuses, common concerns and advice 18

Case studies from clients 1 mo year 1 70% year 2 92% 10 mo 50% est Finance (insurance) System Test Group = 38% (before performance testing) Priority 1 & 2 only: = 31% Operating system 23% to 87% by application Defects: 1 / 4 160 / 200 Scientific software (chemical analysis) Not useful for low numbers of defects 19 Summary for AP Europe 20

Information Technology Conclusions UAT more variable than ST mainly personnel Target zone for ST : 75-90% Factors behind the figures size, complexity, tester experience, time, documentation whether UAT started before ST was finished where on the S-curve when stopped Figures don t tell you cost, severity of those you missed cost of finding 30 stopped here? 25 20 15 10 or here? 5 0 07/12/01 08/12/01 09/12/01 10/12/01 11/12/01 12/12/01 13/12/01 14/12/01 21 21 Summary for AP Europe Project or App. Months Status Comments Before New Testing Process S4 50% ESTIMATED After New Testing Process R1 3 81% FINAL Major re-engineering LBS 4 91% FINAL CP 7 100% FINAL Reporting System DS 3 95% FINAL APC 4 93% FINAL ELCS 4 95% FINAL Eur impl. of US system SMS 3 96% FINAL Enhancement Release C 4 96% FINAL E7 (US) 5 83% FINAL Global Enhancements E7 (Eur) 1 97% Global Enhancements Source: Stuart Compton, Air Products plc 22

Rolling Software Testing Defect Detection Percentage Measure (rolling quarterly produced values looking back four quarters) Period under review # Projects Analysed Target Defects in Testing Total Defects Prod'n Bugs Historical Estimate n/a 50 Rolling 1 Qtrs to Q1 Y1 2 n/a 1111 1400 289 79 Rolling 2 Qtrs to Q2 Y1 1 n/a 1171 1466 295 80 Rolling 3 Qtrs to Q3 Y1 1 n/a 1211 1508 297 80 Rolling 4 Qtrs to Q4 Y1 2 n/a 1492 1807 315 83 Rolling 4 Qtrs to Q1 Y2 3 80 2034 2129 95 96 Rolling 4 Qtrs to Q2 Y2 0 80 1974 2063 89 96 Rolling 4 Qtrs to Q3 Y2 3 80 2086 2204 118 95 Rolling 4 Qtrs to Q4Y2 2 80 1976 2087 111 95 Rolling 4 Qtrs to Q1 Y3 90? Rolling 4 Qtrs to Q2 Y3 90? Rolling 4 Qtrs to Q3 Y3 90? Rolling 4 Qtrs to Q4 Y3 90? Source: Stuart Compton, Air Products plc 23 Anonymous client all systems history of this ST UAT ST UAT ST LR UAT LR All LR Extreme 704 1452 125 33% 31% 92% 95% Ultimate 24 47 8 34% 30% 85% 90% Professional 31 42 5 42% 40% 89% 94% Realistic 30 14 68% Idiotic 253 138 6 65% 64% 96% 98% Total 1042 1693 144 38% 36% 92% 95% 24

Anonymous client (telecoms) 100.00% 98.00% not yet stable decline why? Test Rel defects running 1 1452 125 92.07% 96.00% 2 668 13 98.09% 94.00% 3 341 30 91.91% 4 715 12 98.35% 92.00% 5 516 28 94.85% 90.00% test process improvement 6 535 42 92.72% 88.00% 7 602 49 92.47% 1 2 3 4 5 6 7 8 9 8 542 48 91.86% Release 9 621 36 94.52% 25 Contents introduction: some questions for you what is and how to calculate it case studies uses, abuses, common concerns and advice 26

benefits can highlight test process improvements the effect of severe deadline pressure the impact of overlapping test phases can raise the profile of testing can help predict future defect levels is applicable over different projects reflects testing process in general can give on-going monitoring of testing 27 Uses of when you know test & production defects calculate to monitor the effectiveness of testing in finding defects for different test stages (e.g. ST, UAT) in different releases when you know your prediction: once you know your typical predict the number of production defects (e.g. when software is released) NOTE: not an exact science, but useful to set expectations! 28

Prediction of production defects Defects found so far 66% 20 50% 20 80% 20 Predicted defects not found yet 10 20 5 29 29 Abuses of monitoring individuals or too small a group or timeframe using only other metrics are also important blame the testers if is low prediction only an indication take it with a grain of salt distorted reporting e.g. testers report the same defect 10 times don t report that, it ll make me look bad 30

When NOT to use when you don t have many defects in test or in production (i.e. very high quality software) your defect tracking is immature, purely subjective, untrustworthy, or non-existent the software products you produce are never used by anyone (no live running) it doesn t matter how many defects are in them it is impossible to get data on defects found in live running (difficult is OK!) you re not interested in improving 31 What does it mean? is very high ( > 95%) testing is very good? system not been used much yet? next stage of testing was very poor? e.g. ST looks good but UAT was poor, ST UAT is high but live running will find many defects! is low (< 60%) testing is poor? requirements were very poor, affecting tests? poor quality software (too many to find in the time)? deadline pressure testing was squeezed? 32

Options for measuring what to measure simplest: all test defects / all defects so far by severity level how "deep" to go? deeper levels give more detailed information deeper levels more complex to measure advice: start simple simple information is much better than none learn from what information you have 33 Technical aspects what time frame for defects found in live? this is arbitrary / whatever makes sense for you many people use 1 month, some use 3 or 6 months can I measure of different test stages? any stage where you have defects that came wards but don t measure individual people!! can I use in agile development? yes: choices accumulate, or measure until next release what if different defect tracking systems? ok to combine if consistently recorded 34

Accuracy of defect data most common stumbling block what about duplicates? what about enhancement requests? what if some aren t really defects? the same answer always applies it doesn t matter how you do it as long as you do it the same way each time! most useful aspect of trends, changes over time and why 35 The technical person s trap we re testers we can see all the problems! you may think of lots of problems with this metric yes, (as any measure) can be mis-used but that doesn t mean it can t be useful take the high level view, warts and all, computed simply and consistently, can help you monitor your testing processes and show the effects of both good and bad things 36

How to start using suggested first step calculate for a release that is now live what to measure first? most people start with System Test consider looking at highest severity only to start or two s, one for high severity, one for all defects getting data from live running if you don t normally have live defect data, ask for it data collection & calculation should be easy / automatic get your test management tool or defect tracking tool to calculate it for you automatically 37 Summary: key points requires counts of defects but does not need great accuracy is a useful measure easy to calculate based on defect data you probably already have can tell you how effective your testing efforts are and how other things affect it http://dorothygraham.blogspot.com - discussion about 38