Sequence effects in software development effort estimation Stein Grimstad (steingr@simula.no) Magne Jørgensen (magnej@simula.no)
Accurate estimates of work-effort are essential to enable successful software development projects
Most software development effort estimates are judgment-based Expert estimation is by far the most used estimation method in the software industry (70-80%) [*] Available evidence does not suggest that expert estimation should be replaced by formal estimation models [*] However, it is well known that human judgment is inconsistent and biased [*] (Jørgensen 2007), Estimation of Software Development Work Effort: Evidence on Expert Judgment and Formal Models, Int. J. of Forecasting.
Research question: How does the sequence in which software development projects are estimated affect the estimates in judgment-based estimation of the mostlikely software development effort?
EXPERIMENT 1
Subjects: 56 software professionals from the same company
Material: Three specifications, each described a software development task Small task Medium task Large task
Procedure 1. We handed out a booklet with two estimation tasks Random allocation to treatment 2. The subjects estimated the requirement specifications Unaided expert judgment 3. The subjects self-assessed their technical skills Very Good, Good, OK, Poor 4. We collected their responses when the time was up After about 20 minutes
Treatment: Group A Group B Estimate 1 Estimate 2
Hypothesis: Estimate 2 (Group A) < Estimate 2 (Group B) Group A Group B Estimate 1 Estimate 2
The results demonstrate that estimation sequence can impact effort estimates 800 700 600 Most Likely Estimate 500 400 300 200 100 0 Group_A_(Small-Medium) Group_B_(Large-Medium) Median of 95 vs. 190 work-hours N=56, p = 0,01, effect size = 0,68 (Cohen s d)
and expertise does not seem to remove the effect 500 400 Most Likely Estimate 300 200 100 0 Group_A Group_B Median of 60 vs. 150 work-hours (N=20)
We replicated the experiment to test the robustness of the results on different subjects 160 140 120 Most Likely Estimate 100 80 60 40 20 0 Group_A_(Small-Medium) Group Group_B_(Small-Medium) N=17, p = 0,3, effect size = 0,60 (Cohen s d)
Threats to validity includes issues related to : Time pressure Commitment Estimation method Estimation accuracy Estimation tasks Experimental context
EXPERIMENT 2
Subjects: 46 software development companies from typical off-shoring countries EASTERN EUROPE ASIA
Material: Five real-world requirement specifications, each describes a complete software system.
Procedure: 1. We sent the requirement specifications to the vendors All five specifications were sent in the same mail 2. The vendors completed the estimation work No requirements regarding the order of the estimation work 3. We evaluated the quality of their deliverables In most cases, there were at least one round of updates 4. Payments were made when the work was approved
Each company delivered these artifacts: Functional analysis of the requirement specifications Description of technology and architecture choices Work-breakdown structure Estimates of most likely effort Description of the estimation method Uncertainty assessment
Hypothesis: The companies that first estimate a small system, will, on average, submit low estimates
The results suggest that sequence effects can be large also in real-world estimation situations 500 400 300 200 100 Small_first Large_first N=41, p = 0,15, effect size = 0,37 (Cohen s d)
Some observations The companies that submitted the most detailed functional analyses and architecture/techology discussions may be less impacted by sequence effects It seems like there may be cultural differences (Asian companies are most impacted by sequence) NB! These results are highly uncertain.
Summary Optimizing the sequence of the estimation work, may improve estimation accuracy If estimates are usually too optimistic, estimate the most complex tasks first If estimates are usually realistic, estimate mediumcomplex tasks first Our current understanding of the phenonmena is quite incomplete and more studies are needed For example, there are issues related to time, task charcteristics and estimation accuracy
All five estimates (outsourcing) 6000 5000 4000 Total 3000 2000 1000 0 1