Mean-Variance Combination (MVC): A New Method for Evaluating Effort Estimation Models Lang Xie (ISCAS), Ye Yang (ISCAS),Da Yang(ISCAS),Qing Wang(ISCAS),Mingshu Li(ISCAS) April. 27, 2011
Agenda 1 2 Introduction to cost estimation research at ISCAS MVC method for evaluating cost estimation models 3 Ongoing work 2011-5-3 2
Cost Estimation Research Framework ISCAS Perspective Local Government Cost estimation for contract pricing Literature review COCOMO family models COCOMO-U Budgeting under uncertainty Defects prediction Simulation Software Quality Industry Cost estimation tool Software Cost Estimation Basic Research Coping with the cone of uncertainty Software Process USC JSP WikiWinWin SoftPM COCOMO Coping with the uncertainty Combining estimations Estimation based on Use- Case Cost estimation & process management integration Cost drivers auto-rating Software Measurement SoftPM 3 3
Uncertainty ranges of cost estimations present a decreasing trend as the software development lifecycle proceeds 4x 2x 1.5x Early Design (13 parameters) 1.25x Relative Size Range x 0.8x 0.67x Post-Architecture (23 parameters) 0.5x Applications Composition (3 parameters) 0.25x Concept of Operation Rqts. Spec. Product Design Spec. Detail Design Spec. Accepted Software Feasibility Plans and Rqts. Product Design Detail Design Devel. and Test
Input COCOMO-U The COCOMO-U takes the probability distributions of the estimated project size and other 22 cost factors as input. Output the probability distribution of software development effort 5
The InCoME Process Cost Drivers Analysis & Data Collection Build Cost Models Yes Evaluate Cost Models Require Further Improvement? No Risk Assessment Cost Estimation Decision Support 6
Estimation Process based on COGOMO ---constructive government contract pricing model Government history projects Industry history projects Human capital in China Industry benchmark Project size Customized model input Calibrated parameters Establishment of government knowledge base Effort Estimation Effort distribution Wage-rate in China Estimated effort Cost Analysis Total cost 7
Data:7 versions of Qone Localize of COCOMO Result: A: 1.32 B: 0.94 Qone: a commercial software process management tool, released by a Chinese software enterprise 2011-5-3 8
Data: Cost Estimation based on Use Cases 7 versions of Qone Estimation Model Effort = A * (UCadjusted) B UCadjusted = newuc + Wmod * moduc + Wreu * reuuc + Wdel * deluc QONE case UCadjusted = newuc + 0.2 * moduc +0.05 * adouc version adduc moduc reuuc adjuste d UC effort v1 3 10 216 15.8 2284.5 v2 7 22 207 21.75 3941 v3 86 22 19 111.5 30945 v4 57 61 236 73.65 10340.1 v5 12 31 308 33.25 7477.5 v6 37 30 318 58.55 14903.6 v7 15 8 373 34.9 7166 A B R 2 P-value 96.9396 1.1927 0.928219 0.000481 The method provides guidance for organizations to conduct the maintenance effort estimation based on use cases. It apply use cases as the size metric. The added, modified, reused and deleted types of use cases are identified to be included in the use case metric for estimating the effort of software maintenance.
Propheta-a cost estimation tool Three cost estimation methods: Analogy estimation based ondatabase from CSBSG and ISBSG COCOMO Integrated estimation for software product with multiple modules CSBSG: The China Benchmarking Standards Group ISBSG: The International Software Benchmarking Standards Group 2011-5-3 10
Agenda 1 2 Introduction to cost estimation research at ISCAS MVC method for evaluating cost estimation models 3 Ongoing work 2011-5-3 11
MVC method Background && Motivation MVC(mean-variance combination) method Experiment result 2011-5-3 12
Background(1/3) A wealth of estimation methods existed Evaluation method is important Indicate the problem of estimation models. Drive the improvement of estimation models. Statistic view of model s character Bias and variance 2011-5-3 13
Background(2/3) Bias and Variance Ideal Model y Model 2 Structure: horizontal line Data: whole data set Model 1 Structure: y = a*x + b Data: part of data x 2011-5-3 14
Background(3/3) The true bias can not be caught The distance between the observed value and estimated value contains bias and variance Accuracy indicators: MMRE gives the bias information while stdmre gives the variation information and part of bias. 2011-5-3 15
Motivation Indicators: based on RE or MRE MMRE, stdmre, PRED(N), MdMRE, etc. Evaluation: Cross Validation(CV) Average value of indicators above Interval of indicators above Traditional mean value of indicators in CV are challenged Do not combine the bias and variance together The comparing result varies 2011-5-3 16
MVC method: the whole process History data Model structure Resampling Train and test MVC Process Generate indicators Split Ratio, Re-sampling times 2011-5-3 17
MVC Method: Re-sampling process Re-sampling process Input: data, model structure Output: pairs of (MMRE, stdmre) Fix the ratio of test set and sampling times N Randomly split whole data set N times to get N pairs of (train set, test set) Train and test current model structure N times using the N pairs Calculate N pairs of (MMRE, stdmre) 2011-5-3 18
MVC method: why Re-sampling The history data is limited, small size Independent and identity distribution may not be satisfied Re-sampling is like to simulate the situations: train set VS test set History data Vs the new data C(n,m), the number of possible combination is large. 2011-5-3 19
MVC Method: Generate Indicator paradigm Scatter-plot Convex_hull AUC_L (AUC Lower) AUC_M (AUC Middle) AUC_U (AUC Upper) ACU: Area Under Curve 2011-5-3 20
MVC Method: Generate Indicator Algorithm Input : N pairs of (MMRE, stdmre) Output: three types of area Get the scatter plot of MMRE and stdmre Get the convex hull, and split the convex hull as up part and lower part Extend the two part of convex hull to three types of area 2011-5-3 21
AUC_U Convex hull AUC_M AUC_L 2011-5-3 22
MMRE std_mre Result: Performance of traditional indicators 0.6 0.5 0.4 0.3 0.2 0.1 0 CV times 10 30 50 70 90 110 0.48 0.47 0.46 0.45 0.44 0.43 0.42 0.41 10 30 50 70 90 110 CV times 2011-5-3 23
Results: the scatter plot of MMRE and stdmre 2011-5-3 24
Results: MVC s indicators on two models COCOMO81 dataset. NASA93 dataset model Auc_U Auc_M Auc_L Auc_U Auc_M Auc_L COCOMO 0.6134 0.4197 0.3103 3.2708 2.1228 1.7653 Analogy 0.6403 0.4901 0.3414 0.7075 0.4522 0.3394 2011-5-3 25
Results: variance of indicators divided by mean value 2011-5-3 26
Discussion Benefit: The reason of more stable: distribution replace point The combination 2011-5-3 27
Agenda 1 2 Introduction to cost estimation research at ISCAS MVC method for evaluating cost estimation models 3 Ongoing work 2011-5-3 28
Improve MVC: MMRE Ongoing work Splitting of four areas Left-up High bias and low variance Mean_threshold Left-bottom low bias and low variance stdmre right-up High bias and high variance Right-bottom Low bias and high variance Std_threshold How to determine the threshold for dividing four regions? 2011-5-3 29
Ongoing work Deal with Cross-Company data Definition and measurement of Local Bias (Ye, etc. 2011 ESEM) Build new models to deal with Organization ID Measure uncertainty more accurate Reduce the bias under the indicating of MVC method and express the variance more accurately 2011-5-3 30
Thank you! 2011-5-3 31