TJFS: Turkish Joural of Fuzzy Systems (eissn: 309 90) A Official Joural of Turkish Fuzzy Systems Associatio Vol.4, No.2, pp. 68-76, 203 A Fuzzy Model of Software Project Effort Estimatio Oumout Chouseioglou * Hacettepe Uiversity, Faculty of Egieerig, Departmet of Idustrial Egieerig, 06800 Akara, Turkey E-mail: uhus@hacettepe.edu.tr *Correspodig author Özlem Müge Aydı Hacettepe Uiversity, Faculty of Egieerig, Departmet of Idustrial Egieerig, 06800 Akara, Turkey E-mail: ozlemaydi@hacettepe.edu.tr Abstract Software project effort estimatio is defied as oe of the most difficult tasks i software egieerig, embodyig extesive ucertaity ad vagueess i estimatio parameters. I order to deal with this ucertaity, it has bee suggested that the fuzzy approach ca be employed i the effort estimatio process. This work i progress paper is a proof of the cocept that the estimatio approach ca be fuzzified. The proposed fuzzy estimatio method is applied o a clustered ISBSG dataset ad it is cocluded that the obtaied results are acceptable. Keywords: Software effort estimatio, Fuzzy estimatio, Software projects, COCOMO models. Itroductio Due to their special ature ad complex processes, estimatig the work-effort, cost ad schedule of software developmet projects is oe of the most difficult tasks i software project maagemet. Software egieerig, which ca be defied as the applicatio of a systematic, disciplied, ad quatifiable approaches to the areas related to software, is tryig to address these estimatio difficulties. With that respect, several cost estimatio models have bee developed with the aim of costructively modelig the developmet processes ad accurately predictig the cost of developig software. A exhaustive list of the differet software effort ad cost estimatio studies is give by Jorgese ad Shepperd (2007). Amog the proposed effort estimatio models, the most commo oes are the algorithmic models, such as COCOMO ad fuctio poits. As stated by Shepperd ad Schofield (997), the geeral structure of the model is i the form of 68
a e = a S () 0 where e is effort, S is size, typically measured as lies of code (LOC) or estimated as fuctio poits (fp), α 0 is a productivity parameter ad α is a ecoomies or disecoomies of scale. fp are a uit of measuremet used i software egieerig to express the amout of busiess fuctioality a software program is iteded to provide to a user ad they ca be estimated prior to the developmet of the software program with the use of differet coutig methods, the most kow beig the COSMIC, NESMA ad IFPUG. Nevertheless, as stated by Xu ad Khoshgoftaar (2004), o model has prove to be cosistetly successful i providig accurate effort ad cost estimatios, ad despite all the efforts udertake by the software egieerig disciplie, the estimatio problems still exist today, resultig i delayed ad over budget software. The mai reaso for this is that the iputs of ay effort estimatio formula, like the oe give i Eq (), are vague ad are result of iformed guessig rather tha exact measuremets (Musilek et al., 2000). Moreover, the iformatio about the effort that is required to complete the software is ofte ucertai, imprecise or icomplete (Xu ad Khoshgoftaar, 2004). Therefore, the vagueess ad ucertaity i the effort estimatio iputs ecessitates the utilizatio of alterative approaches, such as fuzzy models. The sources of ucertaity i software cost ad effort estimatio models, ad how a software project ca be described as a fuzzy set are give by Musilek et al. (2000). This paper is the iitial part of a larger study aimig to develop a complete fuzzy model to estimate software project effort by utilizig fuzzy approach i all processes ad parameters of the estimatio. With respect to the overall aim, this study is a proof-ofcocept, presetig that fuzzy approach eve i a simple effort estimatio model does ot oly geerate acceptable results, but also provides the project maagemet with a fuzzy umber that ca be used as the rage of the expected effort, istead of a sigle value. The proposed approach is empirically validated i the Iteratioal Software Bechmarkig Stadards Group (ISBSG) dataset ad the results are preseted i detail. The preset paper is orgaized as follows: Sectio 2 presets a literature review regardig the use of fuzzy logic approaches i software effort ad cost estimatio models. Sectio 3 describes the fuzzy arithmetic ad the model evaluatio ad quality measures that have bee used i this paper. Sectio 4 details the fuzzy effort estimatio proposed ad lists the results obtaied from the empirical evaluatio. The last sectio presets coclusio ad directios for future work. 2. Related work There are several examples with respect to the use of fuzzy approaches ad logic i software effort ad cost estimatio literature. Xu ad Khoshgoftaar (2004) propose a fuzzy idetificatio cost estimatio model to deal with liguistic data, ad automatically geerate fuzzy membership fuctios ad rules. By usig the COCOMO8 project database, the authors cluster the project data with the use of fuzzy c-meas ad use as 69
iputs to the proposed model the cost driver attributes of the COCOMO model. By usig a set of rules they have geerated with the use of the Takagi-Sugeo models they calculate the values of the categories ad subsequet membership fuctios they have devised. Fially they extract a crisp value from the fuzzy sets, similar to the defuzzificatio process, ad they use the Cetroid of Areas method to calculate the defuzzifficatio values. The authors ivestigate 63 project data from the COCOMO8 database ad they empirically cluster them i 5 clusters. They coclude that the cost estimatio accuracy of the fuzzy models geerated by the proposed approach is sigificatly better tha that of the COCOMO models. Musilek et al. (2000) propose the f-cocomo model, a fuzzy set-based geeralizatio of the COCOMO model. I f- COCOMO, istead of usig a sigle umber as the software size, the authors propose the use of a fuzzy umber, which i retur yields to aother fuzzy umber as the cost estimate. The authors coduct a aalysis of their model with differet membership fuctios such as the triagular ad the parabolic fuzzy sets ad they further implemet the use of fuzzy umbers as parameters to the f-cocomo. They experimet with 63 projects ad propose that the fuzzy logic approach eeds to be applied i other software cost ad effort studies, such as the fp. Aroba et al. (2008) have proposed the use of fuzzy clusterig for the developmet of segmeted software cost estimatio models, where a software project may belog to more tha oe segmets. Their approach is tested o the ISBSG dataset, where they report their fidigs for clusters i the size of, 5 ad 20. Idri et al. (200) propose the fuzzy aalogy to be used to estimate the cost of software by providig a solutio to the vagueess ad impreciseess of the software attributes. Estimatio by aalogy is a four step process where first the similar cases are retrieved, the iformatio gathered from them are reused, the proposed solutio is revised ad fially some of the parts of this experiece are retaied to be used i future projects. The authors use fuzzy logic ad liguistic quatifiers i reasoig by aalogy to estimate the effort of software projects ad validate their approach by coductig a experimet o the COCOMO dataset. Similarly Azzeh et al. (20) propose a aalogybased software effort estimatio usig fuzzy umbers, amely Geeralized Fuzzy Number Software Estimatio. They compute the similarity betwee two geeralized fuzzy umbers based o their geometric distaces, ceter of gravities ad height of the geeralized fuzzy umbers, ad use fuzzy c-meas to cluster the existig software project data. The estimatios are coducted with the use of geeralized fuzzy umber operatios ad the effort of a project is estimated as a fuzzy umber which is defuzzified with the method of ceter of gravity. The authors coduct empirical evaluatios with the use of jack-kifig i bechmark software data from the ISBSG, Desharais, Kemerer, Albrecht ad COCOMO datasets. Azzeh et al. (200) have also proposed oe other estimatio by aalogy model that icorporates fuzzy set theory ad grey relatioal aalysis. I this model fuzzy set is employed to reduce ucertaity i distace measure betwee two tuples whereas the grey relatioal aalysis is utilized as a problem solvig method to assess the similarity betwee the tuples with a umber of features. The authors compare their results with case-based reasoig, multiple liear regressio ad artificial eural etworks, usig the datasets give i their work (Azzeh et al. 20). 70
Lopez-Marti et al. (2008) compare three persoal fuzzy logic models to estimate the effort of small software programs, amely triagular, trapezoidal ad Gaussia membership fuctios, with liear regressio model. They develop the fuzzy logic ad liear regressio models usig the data gathered from 05 small programs, ad the the estimatios geerated by these models are compared with each other usig 20 small programs. 3. Fuzzy arithmetic ad model validatio measures For a triagular fuzzy umber M=(m, α, β) let m be the mea value, ad α, β be the left ad right spreads, respectively. Membership fuctio of M ca be writte as, m x L( ), x m α x m µ M ( x) = R( ), x m (2) β 0, otherwise where α, β > 0. Basal (200) defies the multiplicatio of two fuzzy umbers M=(m, α, β) ad N=(, γ, δ) as, M N ( m, mγ + α αγ, mδ + β + βδ ) (3) ad expoetiatio fuctio of two fuzzy umbers as, M N = ( m, m ( γ ) ( + δ ) ( m α ), ( m + β ) m ) (4) The idicators to be used by software practitioers whe comparig the results of differet models are give by Kitcheham et al. (200), ad are used i a variety of studies ivestigated i this paper (Aroba et al. 2008, Azzeh et al. 20, Azzeh et al. 200, Idri et al. 2002). Magitude Relative Error (MRE) computes the absolute percetage of error betwee actual effort (ea) ad estimated effort (ee), for each ivestigated project, as show i Eq (5). MRE i ea ee i i = (5) ea i O the other had, Magitude Error Relative (MER) is give i Eq (6). MER ea ee i i i = (6) eei 7
Mea Magitude Relative Error (MMRE) calculates the average of MRE over all ivestigated items (), as show i Eq (7). Similarly the Mea Magitude of Error Relative (MMER) is give i Eq (8). MMRE = i= MRE i (7) MMER = i= MER i (8) Fially, PRED(q) is used to cout the percetage of estimates that fall withi less tha or equal to q of the actual values. λ PRED( q) = (9) N where λ is the umber of projects where MRE i q ad N is the umber of all estimates. Aroba et al. (2008), referecig Cote et al. (986), state that to evaluate the performace of a give model, a model whose MMRE 0.25 ad PRED(0.25) 0.75 is cosidered to be a good oe. I geeral, a estimatio model with lower MMRE ad higher PRED(q) ca be iterpreted that its derived estimates are more accurate tha other models. 4. Fuzzy umbers for software project effort estimatio Crisp effort estimatio fuctio give i Eq () is calculated as a fuctio of fp, assumig that fp value is exactly kow. However, fp mostly deped o imprecise software attributes (Xu ad Khoshgoftaar, 2004). I order to overcome this ucertaity, fuzzy logic is iserted ito the model. As the mai vagueess come from fp values, fp ~ are defied by triagular fuzzy umbers ad deoted as fp. Hece the estimated effort is achieved as a fuzzy umber ad deoted as e ~. Writig Eq (0) for fuzzy effort estimatio, Eq () is rewritte as, ~ ~ a e = a fp (0) 0 I this study, i additio to fp values, a o ad a parameters are also fuzzified due to their cofidece itervals, which evetually give symmetric triagular fuzzy umbers. The fial fuzzy effort estimatio model is defied i the form of, ~ ~ ~ a~ 0 e = a fp () 72
5. Model adequacy checkig The described fuzzy methodology was applied over the ISBSG dataset, to empirically validate its applicability. ISBSG dataset cotais 5052 software projects of various types, with 8 attributes per project. As the ISBSG projects widely vary with respect to these attributes, i order to obtai a uiform dataset oly the projects whose fp cout is IFPUG ad uadjusted fp ratig is classified as A were selected, resultig to 2257 software projects. Moreover, as obtaiig a sigle estimatio fuctio would ot be appropriate for that fial set of projects, prior to estimatig the effort fuctio, data was clustered accordig to both fp ad effort values. SPSS 7 was used to cluster data set by k-meas methodology. Number of clusters was empirically defied as 20 ad 50 iteratios were coducted at the iitiatio. Withi the determied 20 clusters, the oes icludig at least 20 observatios were take ito cosideratio (totally 2207 projects) ad thus effort estimates depedig o six clusters were calculated by o-liear regressio model estimatio i SPSS. Crisp parameter estimatios a o ad a are preseted i the third ad fifth colums of Table. As stated previously, due to the aforemetioed ucertaity, fp values were cosidered as fuzzy umbers. I fuzzifyig the fp i data set, the spreads were determied by extedig the mea (crisp) fp by ±0,05 * fp. Thus, crisp fp were coverted to symmetric triagular fuzzy umbers by usig the extesio priciple of fuzzy logic. Table. Parameter estimatios for clusters Cluster Number of a 0 a observatios mea spread mea spread 79 680,877 3867,68 0,020 0,035 2 39 0276,03 934,095 0,08 0,029 3 599 2508,763 39,284 0,026 0,023 4 05 32,2 52,848 0,97 0,032 5 38 6448,994 040,687 0,000 0,027 6 2 25349,009 6328,847 0,0 0,035 Spreads of each fuzzy umber accordig to clusters are also give i Table. For each cluster, both crisp ad fuzzy effort estimatios were calculated usig each cluster s parameter estimates. Quality of the estimators were measured by MMRE, MMER ad PRED(0,25) values ad the results for crisp ad fuzzy estimatio models are give i Table 2. At the last step of this study, six clusters were merged ad accuracy errors, calculated by estimatios accordig to their ow estimators, were examied. Quality measures for overall data are preseted i Table 3. It is see i Table 3 that, fuzzy model is some worse tha the crisp oe, however, it is ot such a iadequate model to be avoided. 73
Especially, i cases where lack of data or ill-defied data exists, fuzzy estimatio model ca be used to get satisfactory results. Table 2. Estimatio quality for crisp ad fuzzy estimatio models Cluster Measure CrispModel Fuzzy Model MMRE 0,058 0,244 MMER 0,056 0,6 PRED(0,25),0000 0,9494 MMRE 0,308 0,420 2 MMER 0,293 0,3 PRED(0,25) 0,9424 0,8633 MMRE 0,2405 0,2464 3 MMER 0,2264 0,2245 PRED(0,25) 0,5492 0,5525 MMRE,4277,4637 4 MMER 0,4936 0,4833 PRED(0,25) 0,2683 0,272 MMRE 0,77 0,858 5 MMER 0,724 0,78 PRED(0,25) 0,760 0,7673 MMRE 0,0500 0,0739 6 MMER 0,0498 0,0667 PRED(0,25),0000,0000 Table 3. Estimatio quality of overall data Measure Crisp Model Fuzzy Model MMRE 0,7832 0,8048 MMER 0,3337 0,3288 PRED(0,25) 0,492 0,4875 Table 2 ad 3 show that the calculated accuracy measures for both models are similar to each other. Although the proposed fuzzy model could ot serve better results i all clusters, it should be kept i mid that fuzzy model achieved these similarities by usig exteded values. This extesio eases decisio makers i determiig the fp ad the other parameters. Moreover, withi the 2207 examied software projects, 56,54% of the 74
actual efforts are takig place withi the estimated fuzzy effort iterval. With all these take ito accout, the fuzzified model is cocluded to be sufficiet. 6. Coclusios ad future work Effort estimatio is oe of the most sigificat fields i software project maagemet as it icludes extesive ucertaity ad vagueess, with various types of software projects, thus makig geeralizatios impossible. fp, which are used as a iput for the estimatio model, cosist of some imprecise attributes. I order to decrease ay kid of vagueess related to the effort estimatio, fuzzy logic ca be iserted ito the model ad eve though it does ot guaratee to give the best result, acceptable results ca be achieved. This work i progress study, beig a iitial step i the developmet of a larger fuzzy effort estimatio approach, aimed to seek a acceptable fuzzy estimatio model based o fuzzy fp, by usig fuzzy parameter estimates for clustered ISBSG data. Eve though it is oticeable to coclude that the fuzzy effort estimatio model ca substitute the crisp estimatio model, this udertakig was successful oly with the clustered data i this study. Therefore, prior to estimatios, oe has to decide o the most appropriate cluster. Based o these fidigs, the curret research is o developig a fuzzy effort estimatio model where the fuzzy effort fuctio parameters ad the fuzzy fp umber are calculated based o the project attributes. Moreover, a more geeral fuzzy estimatio model ca be ivestigated for the whole data, idepedet of clusters. Additioally, the results of this study are efficiet for crisp clustered data, whereas a project could be placed i a umber of clusters at the same time. I such cases, fuzzy clusterig methods ca be used before estimatig the project effort. Refereces Jorgese, M., Shepperd, M., A systematic review of software developmet cost estimatio studies, IEEE Trasactios o Software Egieerig, vol. 33 (), pp. 33-53, 2007. Shepperd, M., Schofield, C., Estimatig software project effort usig aalogies, IEEE Trasactios o Software Egieerig, vol. 23 (2), pp. 736-743, 997. Xu, Z., Khoshgoftaar, T.M., Idetificatio of fuzzy models of software cost estimatio, Fuzzy Sets ad Systems, vol. 45 (), pp. 4-63, 2004. Musilek, P., Pedrycz, W., Succi, G., Reformat, M., Software cost estimatio with fuzzy models, ACM SIGAPP Applied Computig Review, vol. 8 (2), pp. 24-29, 2000. Idri, A., Alai, A., Khoshgoftaar, T.M., Fuzzy aalogy: a ew approach for software 75
cost estimatio, Proceedigs of the Iteratioal Workshop o Software Measuremet, 200. Idri, A., Khoshgoftaar, T.M., Abra, A., Ivestigatig soft computig i case-based reasoig for software cost estimatio, Egieerig Itelliget Systems for Electrical Egieerig ad Commuicatios, vol.0 (3), pp.47-58, 2002. Azzeh, M., Neagu, D. Cowlig, P.I., Aalogy-based software effort estimatio usig Fuzzy umbers, Joural of Systems ad Software, vol. 84 (2), pp. 270-284, 20. Azzeh, M., Neagu, D., Cowlig, P.I., Fuzzy grey relatioal aalysis for software effort estimatio, Empirical Software Egieerig, vol. 5 (), pp. 60-90, 200. Lopez-Marti, C., Yaez-Marquez, C., Gutierrez-Tores, A., Predictive accuracy comariso of fuzzy models for software developmet effort of small programs, Joural of Systems ad Software, vol. 8 (6), pp. 949-960, 2008. Basal, A., Some No Liear Arithmetic Operatios o Triagular Fuzzy Numbers Advaces i Fuzzy Mathematics, vol. 5 (2), pp. 47-56, 200. Aroba, J., Cuadrado-Gallego, J.J., Sicilia, M.A., Ramos, I., Garcia-Barriocaal, E., Segmeted software cost estimatio models based o fuzzy clusterig, Joural of Systems ad Software, vol. 8, o., pp. 944-950, 2008. Kitcheham, B.A., Pickard, L.M., MacDoell, S.G., Shepperd, M.J., What accuracy statistics really measure [software estimatio], Software, IEE Proceedigs, vol. 48 (3), pp. 8-85, 200 76