A Comparison of Calibrated Equations for Software Development Effort Estimation



Similar documents
Hathaichanok Suwanjang and Nakornthip Prompoon

A Specific Effort Estimation Method Using Function Point

Software Development Cost and Time Forecasting Using a High Performance Artificial Neural Network Model

Software Cost Estimation: A Tool for Object Oriented Console Applications

Software project cost estimation using AI techniques

Estimating Size and Effort

A HYBRID INTELLIGENT MODEL FOR SOFTWARE COST ESTIMATION

Software Cost Estimation

METHODS OF EFFORT ESTIMATION IN SOFTWARE ENGINEERING

A New Approach in Software Cost Estimation with Hybrid of Bee Colony and Chaos Optimizations Algorithms

Project Planning and Project Estimation Techniques. Naveen Aggarwal

Efficient Indicators to Evaluate the Status of Software Development Effort Estimation inside the Organizations

Project Management Estimation. Week 11

Comparison and Analysis of Different Software Cost Estimation Methods

Keywords : Soft computing; Effort prediction; Neural Network; Fuzzy logic, MRE. MMRE, Prediction.

A Concise Neural Network Model for Estimating Software Effort

An Evaluation of Neural Networks Approaches used for Software Effort Estimation

Software Engineering. Dilbert on Project Planning. Overview CS / COE Reading: chapter 3 in textbook Requirements documents due 9/20

Literature Survey on Algorithmic Methods for Software Development Cost Estimation

Center for Computing Research, National Polytechnic Institute; P.O , Mexico, D.F. 3

MTAT Software Economics. Lecture 5: Software Cost Estimation

E-COCOMO: The Extended COst Constructive MOdel for Cleanroom Software Engineering

Computer Science and Software Engineering University of Wisconsin - Platteville 3.Time Management

Resource Estimation in Software Engineering

Comparison of SDLC-2013 Model with Other SDLC Models by Using COCOMO

Towards a Methodology to Estimate Cost of Object- Oriented Software Development Projects

Fuzzy Logic based framework for Software Development Effort Estimation

Software Development Effort Estimation by Means of Genetic Programming

A Review of Comparison among Software Estimation Techniques

INCORPORATING VITAL FACTORS IN AGILE ESTIMATION THROUGH ALGORITHMIC METHOD

Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects

C. Wohlin, "Is Prior Knowledge of a Programming Language Important for Software Quality?", Proceedings 1st International Symposium on Empirical

A Comparative Evaluation of Effort Estimation Methods in the Software Life Cycle

Cost Estimation for Web Applications

REVIC 11: Converting the REVIC Model to COCOMO I1

Deducing software process improvement areas from a COCOMO II-based productivity measurement

A HYBRID FUZZY-ANN APPROACH FOR SOFTWARE EFFORT ESTIMATION

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Size-Based Software Cost Modelling with Artificial Neural Networks and Genetic Algorithms

Lecture 14: Cost Estimation

Pragmatic Peer Review Project Contextual Software Cost Estimation A Novel Approach

Resource Estimation in Software Engineering 1

Software Development: Tools and Processes. Lecture - 16: Estimation

Chapter 23 Software Cost Estimation

SOFTWARE COST DRIVERS AND COST ESTIMATION IN NIGERIA ASIEGBU B, C AND AHAIWE, J

CISC 322 Software Architecture

Keywords Software development Effort Estimation, MMRE, Pred, BRE, RSD, RMSE, GMF, Tri MF and Trap MF,

SOFT COMPUTING TECHNIQUES FOR SOFTWARE PROJECT EFFORT ESTIMATION

A Case Study Research on Software Cost Estimation Using Experts Estimates, Wideband Delphi, and Planning Poker Technique

Hybrid Neuro-Fuzzy Systems for Software Development Effort Estimation

A replicated Assessment and Comparison of Common Software Cost Modeling Techniques

CISC 322 Software Architecture. Example of COCOMO-II Ahmed E. Hassan

Software Engineering. Reading. Effort estimation CS / COE Finish chapter 3 Start chapter 5

A replicated Assessment and Comparison of Common Software Cost Modeling Techniques

Topics. Project plan development. The theme. Planning documents. Sections in a typical project plan. Maciaszek, Liong - PSE Chapter 4

CS Homework 4 p. 1. CS Homework 4. To become more familiar with top-down effort estimation models, especially COCOMO 81 and COCOMO II.

Multinomial Logistic Regression Applied on Software Productivity Prediction

Software Project Level Estimation Model Framework based on Bayesian Belief Networks

Approach of software cost estimation with hybrid of imperialist competitive and artificial neural network algorithms

USING COMPUTING INTELLIGENCE TECHNIQUES TO ESTIMATE SOFTWARE EFFORT

A Survey on Cost Estimation Process in Malaysia Software Industry

Software Cost Estimation using Function Point with Non Algorithmic Approach

Network Security Project Management: A Security Policy-based Approach

Software estimation process: a comparison of the estimation practice between Norway and Spain

Prediction of Business Process Model Quality based on Structural Metrics

Research Article Predicting Software Projects Cost Estimation Based on Mining Historical Data

IMPROVEMENT AND IMPLEMENTATION OF ANALOGY BASED METHOD FOR SOFTWARE PROJECT COST ESTIMATION

Efficiency Metrics. Tamanna Siddiqui 1, Munior Ahmad Wani 2 and Najeeb Ahmad Khan 3

TOWARD AN EFFORT ESTIMATION MODEL FOR SOFTWARE PROJECTS INTEGRATING RISK

Optimal Resource Allocation for the Quality Control Process

A Method for Estimating Maintenance Cost in a Software Project: A Case Study

ALGORITHM OF SELECTING COST ESTIMATION METHODS FOR ERP SOFTWARE IMPLEMENTATION

A New Approach For Estimating Software Effort Using RBFN Network

Software cost estimation

Software Metrics & Software Metrology. Alain Abran. Chapter 4 Quantification and Measurement are Not the Same!

A Fuzzy Decision Tree to Estimate Development Effort for Web Applications

Impact of CMMI-Based Process Maturity Levels on Effort, Productivity and Diseconomy of Scale

Module 11. Software Project Planning. Version 2 CSE IIT, Kharagpur

The software maintenance project effort estimation model based on function points

Pearson s Correlation

Introduction. Research Problem. Larojan Chandrasegaran (1), Janaki Samuel Thevaruban (2)

Manual Techniques, Rules of Thumb

INVESTIGATING THE RELATIONSHIP BETWEEN SOFTWARE DEFECT DENSITY AND COST ESTIMATION DRIVERS: AN EMPIRICAL STUDY

The Art of Project Management: Key Adjustments Factors using Dynamic Techniques

An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques

COMPARATIVE STUDY OF SOFTWARE TESTING TOOLS ON THE BASIS OF SOFTWARE TESTING METHODOLOGIES

The Bass Model: Marketing Engineering Technical Note 1

ANALYSIS OF SIZE METRICS AND EFFORT PERFORMANCE CRITERION IN SOFTWARE COST ESTIMATION

MEASURING THE SIZE OF SMALL FUNCTIONAL ENHANCEMENTS TO SOFTWARE

Cost Estimation Tool for Commercial Software Development Industries

An Empirical Study of Software Cost Estimation in Saudi Arabia Software Industry

CHAPTER 1 OVERVIEW OF SOFTWARE ENGINEERING

Academic Course Description. SE2003 Software Project Management Second Semester, (Even semester)

NTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling

The ROI of Systems Engineering: Some Quantitative Results

STATISTICAL ANALYSIS OF UBC FACULTY SALARIES: INVESTIGATION OF

Modern Empirical Cost and Schedule Estimation Tools

Extending Change Impact Analysis Approach for Change Effort Estimation in the Software Development Phase

Project Estimation Kostas Kavoussanakis, EPCC. Overview. 4Aim:

The role of Software Metrics on Software Development Life Cycle

Transcription:

A Comparison of Calibrated Equations for Software Development Effort Estimation Cuauhtemoc Lopez Martin Edgardo Felipe Riveron Agustin Gutierrez Tornes 3,, 3 Center for Computing Research, National Polytechnic Institute, Mexico Av. Juan de Dios Batiz s/n esquina Miguel Othon de Mendizabal, Unidad Profesional "Adolfo Lopez Mateos" Edificio CIC, Colonia Nueva Industrial Vallejo, Delegación Gustavo A. Madero, P.O. 07738, Mexico D.F. cuauhtemoc@sagitario.cic.ipn.mx ; edgardo@cic.ipn.mx ; 3 atornes@cic.ipn.mx Abstract. In this paper, from actual data of four projects, equations for software development effort estimation are calibrated for a local environment. Metrics of lines of code as well as function points are used as independent variables in linear and non-linear regression equations. Furthermore, Mean Magnitude of Relative Error (MMRE) is used as the evaluation criterion to compare these calibrated equations with other ones obtained by other researchers. Results demonstrate that calibrated linear regression estimation model has a better accuracy for the local environment of this case study. Keywords: Software effort estimation; Lines of code; Function points; Correlation; Linear and Non-Linear regression.. Introduction Three main problems are related to a project: delivery time, effort, and quality. It results difficult to know how long the software will be finished and how much its cost will be. Software estimation has been identified as one of the three great challenges for halfcentury-old computer science []. No method or model of estimation should be preferred over all others. The key consists in using a variety of methods and tools and then to investigate why estimation may differ significantly from one to another []. In this paper, from actual data of four projects developed in the University of Guadalajara, the effort estimation equations are calibrated for its local environment. In accordance with Heemstra and Kusters [3], in practice, expert judgment and analogy estimation are the most frequently applied estimation methods, while algorithmic (or parametric) estimation methods seem to be rarely used. This paper encourages the use of algorithmic estimation methods. In algorithmic models, the development effort is estimated as a function of variables representing the most important cost drivers in the project. Usually, the variables are identified by correlation analysis of data on completed projects [4]. In order to measure the accuracy of software estimations, several studies have evaluated estimation models using the Mean

Magnitude of Relative Error (MMRE), defined as MMRE = Σ i= [ estimate i - actual i / actual i ] / n, where estimate i is the estimated effort from the model, actual i is the actual effort, and n is the number of projects. When some models have not been calibrated, the MMRE have ranged from 57% to 800%, whereas those ones that have been calibrated the MMRE have reflected % [5]... Correlation (r) and Coefficient of Determination (r ) The correlation is the degree to which two sets of data (i.e. lines of code and effort) are related [6]. The correlation value r, varies from -.0 to +.0. To be useful for estimating, the value of r (named coefficient of determination) should be greater than 0.5; the correlation coefficient can be calculated as follows: [ ( LOC E) ] [( LOC) ( E) ] n r = () n LOC ( LOC) n E ( E) Where n is the number of observation pairs, LOC are the lines of code and E is the development effort... Linear Regression When two sets of data are strongly related, it is possible to use a linear regression procedure to model this relationship. The regression analysis is a technique to express the relationship between two variables and to estimate the dependent variable (i.e. Effort) basing on independent variable (i.e. LOC). The regression analysis is used to develop the equation of the line, which serves to do predictions. The linear regression equation using least squares is the following [7]: Where E = a + b (LOC) () [ ( LOC E) ] ( LOC)( E) n b = (3) n ( LOC ) ( LOC) E LOC a = b (4) n n.3. Non -Linear Regression If the number of projects is less than ten, then the constant a of the COCOMO equation can be calibrated using the equation 5 [4]. The COCOMO equation is E = a(kloc) b *EAF, where E is effort in man-months (a man-month is equivalent to 5 hours per month); EAF is the effort adjustment factor; KLOC is the number of lines of code (in thousands); a and b are all constants based on the mode: Organic: a =.4, b =.05; Semi-detached: a = 3.0, b =.05; and Embedded a = 3.6, b =.0. The EAF is used to tailor the estimation based on conditions of the development environment.

For the COCOMO basic model it is not used and just set to. For the COCOMO intermediate model there are 5 different cost drivers that can be used to calculate (multiplying themselves) the EAF [4]. a n i= = n i= AE Q Where n corresponds to the number of developed projects, AE is the actual effort, and i is each individual projects. To calculate Q according to organic model (each model has its own equation), the equation Q i = (KLOC i ).05 * EAF i must be used. If the number of projects is more than nine, both constant a and exponent b of COCOMO equation can be calibrated using the following equations [4]: Where: ad0 ad loga = a0a a Q i i i a0d ad 0 b = a0a a a 0 = Number of projects d 0 = log(effort Real /EAF) a = log(kloc Real ) d = log(effort Real /EAF) log(kloc Real ) a = log(kloc Real ) (5).4. Evaluation Criterion A common criterion for the evaluation of cost estimation models is the Magnitude of Relative Error (MRE) [8]. The MRE value is calculated for each observation i whose effort is predicted. The aggregation of MRE over multiple observations (N), can be achieved through the Mean Magnitude of Relative Error (MMRE). MRE as well as MMRE are defined as follows: Actual Effort N = i predicted Efforti MREi MMRE = MREi Actual Efforti N i= In general, the accuracy of an estimation technique is inversely proportional to the MMRE.. State of the art For most algorithmic models, the calibration to a specific software environment can be performed to improve the estimation. The equations are based upon research and historical data, and use such inputs as source Lines of Code (LOC) (either physical or logical [9] based on a coding standard [6]) or Function Points. So far, several equations have been generated by previous researches; some of them are the following [0]: Effort Equation Author(s) Effort Equation Author(s) E = 5. (KLOC) 0.9 Walston-Felix E = 4.86 (KLOC) 0.976 RADC E = 0.7 (KLOC).50 Halstead E = 5.8 (KLOC).047 Doty E = 5.5 +0.73 (KLOC).6 Bailey-Basili E =.43 (KLOC) 0.96 JPL

3. Methodology used. The number of physical lines of code (LOC) of each project was counted and then using linear regression based on both correlation and coefficient of determination, the development effort was calculated.. COCOMO non-linear effort equation was both calibrated and applied basing it on correlation as well as on coefficient of determination. 3. The number of Unadjusted Function Points (UFP) of each project was calculated and then using linear regression (considering correlation as well as coefficient of determination), the development effort was calculated. 4. Non-linear equations of algorithmic models proposed by Boehm (COCOMO), Walston-Felix, Halstead, Bailey-Basili, RADC, Doty model, and JPL were applied. Results of these equations were compared with those results generated in points, and 3 of this section. MMRE was used as evaluation criterion. 4. Experimental Results 4.. Data Gathering In accordance with the Mexican National Program for Software Industry Development, the 98% of software from Mexican enterprises do not have formal processes to record, track and control measurable issues during the development process []. This fact implies difficulty to obtain actual data. Data from four projects of the Information Systems Department of the University of Guadalajara were collected, that is, : Emission and Tracking of Students Pay Orders; : Extensions and Demands System; 3: Regional System for Fruit and Vegetable Planning; and 4: Virtual Payment; their metrics are depicted in Table and they will serve to calibrate regression equations. A detailed description of COCOMO EAF as well as Unadjusted Function Points (UFP) can be consulted in []. Project LOC Effort Unadjusted Function COCOMO Points (UFP) EAF 3944 8 3.08 3006 0.80 3 500.5 74 0.800 4 600 35 409.80 Table. Projects Actual Data 4.. Calibrating Linear and non-linear Regression Equations Once the number of LOC has been counted, it is possible to generate effort equations. The first step is to calculate coefficients of correlation as well as determination. According to Equation, the results obtained are r = 0.9869 and r = 0.9740. Both

results show high level. In accordance with Equations 3 and 4, the values of a and b are calculated. The final effort equation using linear regression, according to Equation, is the following: E = 7.6996 + 9.94( KLOC) (6) According to Equation 5, the value of constant a for a non-linear equation is calculated as follows: Project KLOC EAF Effort Q (Effort)(Q) Q 3.944.08 8 4.34 34.739 8.856 3.006.80 4.065 8.3 6.57 3.8 0.800.5.358 5.896 5.56 4 6..80 35 8.05 80.58 64.37 Sum 39.84 05.8 a = 3.3 Then, in accordance with COCOMO Equation, the non-linear equation for estimating the effort (organic model) is the following:.05 E = 3.3( KLOC) EAF (7) With Function Points as independent variable, the results are r = 0.9566 and r = 0.95. Both these results depict high level. In accordance with Equations 3 and 4, the values of a and b are calculated. The final effort equation using linear regression according to Equation is the following (a paper related with LOC-FP equivalence can be consulted in [3]): E = 0.0354 + 0.074( FP) (8) 4.3. Comparing MRE i and MMRE Results with both Calibrated and Original Equations (the unit measure of effort is man-month) Project Eq. 6 Eq. 7 Eq. 8 COCOMO Walston- Halstead Bailey- RADC Doty JPL Felix Basili 0.43 0.70 0.60 0.30.7 0.3 0.4.3.78 0.4 0.07 5.36 0. 3.88 6.08 0.8 3.06 6. 7.36.50 3 0.96.95.46.6 4.3 0.3.6 4.3 5..6 4 0.03 0.8 0.03 0.45 0. 0.69 0.67 0.8 0.0 0.60 Sum.49 8.30 3. 5.90.87.4 6.03.9 4.36 4.86 MMRE 0.37.07 0.80.47.97 0.54.5.98 3.59. Last table depicted that MMRE values vary from 0.37 to 3.59. It can be observed that calibrated linear regression equation using LOC has better accuracy with MMRE = 0.37, while calibrated linear regression equation using Function Points appears in third place with 0.80.

5. Conclusions and Directions for Future Researches In this paper, from actual data of four projects, linear and non-linear regression equations for software development effort estimation were calibrated for a local environment. These calibrated equations were compared with others ones obtained by other researches. This comparison was based on the Mean Magnitude of Relative Error (MMRE). Results demonstrated that the calibrated linear estimation model for this local environment had a better accuracy. The 98% of software from Mexican enterprises do not have formal processes to record, track and control measurable issues during the development process; this fact reduces the effectiveness of any software estimation technique since all techniques require historical data. This situation was reflected in this paper and could represent its weakness. However, the calibration activities depicted can be used when more data is available. Future research will involve the application of other estimation alternatives as Fuzzy Logic as well as Neural Networks. References [] Brooks Fredrick P. Jr., Three Great Challenges for Half-Century-Old Computer Science. Journal of the ACM, Vol. 50, No. pp. 5-6, January 003 [] Boehm B., Abts Ch., Chulani S. Software Development Cost Estimation Approaches A Survey. Chulani Ph. D. Report. 998 [3] Heemstra F., Kusters R., Software cost estimation in the Netherlands: 0 years later, Proceedings of the European Software Control and Metrics Symposium (ESCOM- SCOPE), 999, pp. 3 3. [4] Boehm B., Software Engineering Economics, Englewood Cliffs, 98. [5] Hareton Leung, Zhang Fan, Software Cost Estimation, The Hong Kong Polytechnic University, Hong Kong. 000 [6] Humphrey W. A Discipline for Software Engineering, Addison Wesley, 00. [7] Richard A. Johnson. Probabilidad y Estadística para Ingenieros. Prentice Hall, 997 [8] Lionel C. Briand, Khaled El Emam, Dagmar Surmann, Isabella Wieczorek. An Assessment and Comparison of Common Software Cost Estimation Modeling Techniques. ISERN-98-7 [9] Park R. E. Software Size Measurement: A Framework for Counting Source Statements. SEI, Carnegie Mellon University, September 99. [0] Pressman R., Software Engineering, A Practitioner s Approach, McGraw Hill, 00 [] Secretaría de Economía, Programa para el Desarrollo de la Industria del Software, June 00. Available: http://www.economia.gob.mx/?p=8 [] Lopez-Martin Cuauhtemoc, Gutierrez-Tornes Agustin, Software Effort Estimation: A Designed Process for Structured and Object Oriented Software Engineering Approaches, Proceedings of the th International Congress on Computer Science Research, CIICC 04, September 9-30, October, 004 Tlalnepantla, México [3] Lopez-Martin, Cuauhtémoc, Lines of Code as a Source for Function Point Estimation Using Linear Regression and Correlation, XVI Congreso Nacional y II Internacional de Informática y Computación 003, October 003