A HYBRID FUZZY-ANN APPROACH FOR SOFTWARE EFFORT ESTIMATION

A HYBRID FUZZY-ANN APPROACH FOR SOFTWARE EFFORT ESTIMATION Sheenu Rizvi 1, Dr. S.Q. Abbas 2 and Dr. Rizwan Beg 3 1 Department of Computer Science, Amity University, Lucknow, India 2 A.I.M.T., Lucknow, India 3 Integral University, Lucknow, India ABSTRACT Software development effort estimation is one of the major activities in software project management. During the project proposal stage there is high probability of estimates being made inaccurate but later on this inaccuracy decreases. In the field of software development there are certain matrices, based on which the effort estimation is being made. Till date various methods has been proposed for software effort estimation, of which the non algorithmic methods, like artificial intelligence techniques have been very successful. A Hybrid Fuzzy-ANN model, known as Adaptive Neuro Fuzzy Inference System (ANFIS) is more suitable in such situations. The present paper is concerned with developing software effort estimation model based on ANFIS. The present study evaluates the efficiency of the proposed ANFIS model, for which COCOMO81 datasets has been used. The result so obtained has been compared with Artificial Neural Network (ANN) and Intermediate COCOCMO model developed by Boehm. The results were analyzed using Magnitude of Relative Error (MRE) and Root Mean Square Error (RMSE). It is observed that the ANFIS provided better results than ANN and COCOMO model. KEYWORDS Software Effort Estimation, RMSE, ANFIS, ANN, COCOMO, MRE. 1. INTRODUCTION One of the key challenges in software industry is the accurate estimation of the development effort, which is particularly important for risk evaluation, resource scheduling as well as progress monitoring. Inaccuracies in estimations lead to problematic results; for instance, overestimation causes waste of resources, whereas underestimation results in approval of projects that will exceed their planned budgets. For this many models has been framed so as to make it cost effective. These models can be examined based on methodologies used: Expert-based, analogybased and regression-based. Expert based models depend on the expert knowledge to use past experience on software projects. Based on a comprehensive review, expert based estimation is one of the most frequently applied estimation strategy. Alternatively, regression-based methods use statistical techniques such as least square regression, in the sense that a set of independent variables explain the dependent variable with minimum error rate. Mathematical models like Barry Boehm s COCOMO [1] and COCOMO II [2] are widely investigated regression-based methods. Parameters of these models are calibrated according to the projects in a company. Thus, they have the drawback of requiring local calibration. To combat these problems a hybrid Fuzzy- ANN model known as Adaptive Neuro Fuzzy Inference System (ANFIS) has been dealt in this paper. DOI:1.5121/ijfcst.214.455 45

2. DATA USED The data used is COCOMO 81. The data utilised for ANFIS model development as input and output variables are given in the Table 1. Total sixteen input variables have been used which include fifteen effort multipliers and the size measured in thousand delivered lines of code. Development Effort (DE) has been used as the output of the model measured in man-months. The data were collected from the analysis of sixty three (63) software projects, as published by Barry Boehm in 1981[3] [16]. Table 1. Input and Output variables for ANFIS model. Input Variables Output Variable RELY - Required software reliability DATA - Data base size, CPLX - Product complexity, TIME - Execution time, STOR main storage constraint, VIRT virtual machine volatility TURN computer turnaround time, ACAP analyst capability, AEXP applications experience, PCAP programmer capability, VEXP virtual machine experience, LEXP language experience MODP modern programming, TOOL use of software tools, SCED required development schedule, SIZE in KLOC Development Effort (DE) Source: - COCOMO81 Dataset (PROMISE Software Engineering Repository data [16]) 3. ANFIS MODEL DEVELOPMENT 3.1. Parameter Selection ANFIS [9],[1] is a judicious integration of FIS and ANN, capable of learning, high-level thinking and reasoning and it combines the benefits of these two techniques into a single capsule [4]. The success for FIS is the finding of the rule base. The reason being that there are no specific techniques for converting the knowledge of human beings into the rule base and also in order to maximise the performance of the model and to minimize the output error, further fine tuning of the membership functions is required. Thus when generating a FIS using ANFIS, it is important to select proper parameters, including the number of membership functions (MFs) for each individual antecedent variables. It is also vital to select appropriate parameters for learning and refining process, including the initial step size (ss). In the present work the commonly used rule extraction method applied for FIS identification and refinement is subtractive clustering. The MATLAB Fuzzy Logic Toolbox [7] has been used for ANFIS model development. Here the initial parameters of the ANFIS are identified using the subtractive clustering method [5]. However, it is vital to properly define the substractive clustering parameters, of which the clustering radius is the most important. It is determined through a trial and error approach. By varying the clustering radius r a with varying step size, the optimal parameters are obtained by 46

minimizing the root mean squared error based on the validation datasets. Clustering radius r b is selected as 1.5r a. Gaussian membership functions are used for each fuzzy set in the fuzzy system. The number of membership functions and fuzzy rules required for a particular ANFIS is determined through the subtractive clustering algorithm. Parameters of the Gaussian membership function are optimally determined using the hybrid learning algorithm. Each ANFIS is trained for 1 epochs. Gaussian membership function has been used as the input membership function and linear membership function for the output function. Here separate sets of input and output data has been used as input arguments. In MATLAB genfis2 generates a Sugeno-type FIS structure using subtractive clustering. Genfis2 is generally used where there is only one output; hence here it has been used to generate initial FIS for training the ANFIS. On the other hand genfis2 achieves this by extracting a set of rules that simulates the data values. In order to determine the number of rules and antecedent membership functions, subclust function has been used by the rule extraction methods. Further it uses the linear least squares estimation to determine each rule's consequent equations. The parameters used in the model for training ANFIS are given in Table 2 and the rule extraction method used is given in Table 3. Table 4 summarizes the results of types and values of model parameters used for training ANFIS Table 2. Parameters used in all the models for training ANFIS Rule extraction method Subtractive clustering used Input MF type Gaussian membership ( gaussmf ) Input partitioning variable Output MF Type Linear Number of output MFs one Training algorithm Hybrid learning Training epoch number 1 Initial step size.1 Table 3. Rule extraction method used for training ANFIS Rule Extraction Method And method Or method Defuzzy method Implication method Aggregation method Type prod probor wtever prod max Table 4. Values of parameters used for training ANFIS No. of nodes 1311 No. of linear parameters 646 No. of non-linear parameters 1216 Total no. of parameters 1862 No. of training data pairs 4 No. of testing data pairs 23 No. of fuzzy rules 38 47

4. RESULT AND DISCUSSION Here the ANFIS model has been trained tested by ANFIS method and their performance for the best prediction model are evaluated and compared for training and testing data sets separately. The RMSE performances of the ANFIS model both for training and testing datasets have been plotted separately in Fig. 1 & Fig.2 and their corresponding range of values (minimum and maximum) are summarized in Table 5. Figure 1. Graphical plot of RMSE value variation during training Figure 2. Graphical plot of RMSE value variation during testing Table 5. Range of RMSE during training and testing phase RMSE Value Minimum Maximum Training datasets.4824 2.896 Testing datasets 186.41 188.41 48

Further Table 6 gives the RMSE values using COCOMO, ANN and ANFIS techniques. Table 6. Performance evaluation using RMSE criteria RMSE Val. COCOMO ANN ANFIS 532.2147 353.1977 112.638 From analysis of Fig. 1 & Fig. 2 and perusal of the data given in tables 5 it is inferred that during training phase (Fig.1), there is zig zag variation in the RMSE values, having a minimum value of.4824 (at epoch 8) and a maximum value of 2.896 ( epoch 3). Hence during training phase there is initially a rise in the RMSE value and then there is a fall at epoch no. 8, after which there is again a slight increase. On the other hand, during testing phase (Fig.2) of ANFIS training initially upto epoch 4 the RMSE value decreases and reaches upto a minimum of 186.41 and then there is steep rise in the RMSE value upto 1 epochs, where the maximum value reached is 188.41. From Table 5 it can be inferred that ANFIS has performed better during training phase than testing phase but its overall RMSE value is 112.638. Which shows a marked improvement than those calculated in ANN and COCOMO model i.e. 353.1977 and 532.2147 respectively. (Given above in Table 6). Further consider the absolute values of Magnitude of Relative Error (MRE) calculated both for COCOMO and ANFIS models (given below in Table 7) and their comparative plot, both for training and testing datasets (as given in Fig. 3 & 4). From the perusal of both the data and the graphical plot, it is seen that during the training as well as testing phase of the ANFIS model development, the absolute values of the MRE are very less as compared to COCOMO model, especially during training phase. Since Absolute MRE computes the absolute percentage of error between the actual and predicted effort for each project, hence from the above data analysis it can be derived that the absolute percentage of error between the actual and predicted effort using ANFIS technique is far less than those using COCOMO model. Thus, it is clear that proper selection of influential radius which affects the cluster results directly in ANFIS using subtractive clustering rule extraction method has resulted in reduction of RMSE and MRE both for training and testing data sets. Hence, it is seen that for small size training data, ANFIS has outperformed ANN and COCOMO model. Table 7. Comparative chart of Absolute values of MRE for COCOMO and ANFIS Model S.No. ABS MRE COCOMO ABS MRE ANFIS 1. 8.651813725.13189 2. 73.911625.3832219 3. 1.377489712.195532 4. 2.825.158388 5. 16.93939394.22853 6. 4.51162791 1.22696E-5 7. 22.125.142747 8. 41.41395349 1.94362E-5 9. 21.4728132 1.1152E-5 1. 14.177579 5.4767E-5 11. 42.2218349.783969 12..646766169 9.3241E-5 13. 43.7848113.854332 14. 16.41666667 6.9513E-7 49

15. 28.4754984 4.7574E-6 16. 45.575 1.81974E-5 17. 181.7777778.19538 18. 18.5412281.9939471 19. 45.78439394.41568784 2. 1.5675.7541921 21. 24.5334623.663228 22. 12.6767956 2.95788E-5 23. 15.71799629.118637 24. 31.3885297.124277 25. 49.22179732.2224 26. 26.12428941 7.7421E-6 27. 19.43181818.151894 28. 35.6326536 2.81222E-5 29. 5.342465753.362236 3. 8.66116949.64311 31. 14.314258 2.2618E-5 32. 94.69857.2576867 33. 8.978512397 5.71114E-5 34. 26.782687 1.92174E-5 35. 51.8177317 7.19225E-6 36. 27.74545455 5.829E-6 37. 86.59574468.16447 38. 64.25 1.23164E-5 39. 22.5.42334 4. 22.25 1.1181E-6 41. 13.16666667 34.111937 42. 142.8666667 33.128475 43. 24.9759361 17.5124589 44. 52.72413793 49.5818218 45. 3.18867925 96.8757342 46. 69.76984127 12.325458 47. 8.972222222 6.6176694 48. 73.31996855 41.92811776 49. 9.288461538 114.787153 5. 7.693181818 7.139281263 51. 32.1832787 23.1517377 52. 11.731773 24.48625124 53. 6.7142857 4.28145 54. 41.1 73.28148424 55. 58.27777778 7.1534294 56. 59.479812 59.7718117 57. 17.2531646 25.23833685 58. 11.68461538 11..721121 59. 18.25714286 22.62693271 6. 12.877193 1.9231245 61. 5.48 18.81248 62. 8.36842153 27.459325 63. 14.2 31.298885 5

A b s o l u t e M R E Absolute MRE of COCOMO and ANFIS Output for training data 2 1 1 4 7 1 13 16 19 22 25 28 31 34 37 4 No. of Projects COCOMO MRE ANFIS MRE Figure 3. Absolute MRE plot for COCOMO and ANFIS Output for training datasets A b s o l u t e M R E 2 15 1 5 MRE of COCOMO and ANFIS output for testing data 1 3 5 7 9 11 13 15 17 19 21 23 No. of Projects MRE COCOMO MRE ANFIS Figure 4. Absolute MRE plot for COCOMO and ANFIS Output for testing datasets In order to depict how well ANFIS has performed over ANN and COCOMO model, a comparative plot of actual effort versus predicted effort, by COCOMO, ANN and ANFIS technique, has been shown in Fig. 5 using data given in Table 8.. From the graph it is seen that ANFIS model line almost closely follows the actual effort line than those of COCOMO. This again depicts the superiority of ANFIS technique over ANN and COCOMO model for effort estimation. Table 8. Comparative chart of Actual Effort Versus Estimated Effort using COCOMO, ANN and ANFIS S. No Actual Effort Estimated Effort using COCOMO ANN ANFIS 1 24 1863.53 24.22 24.2 2 16 2782.577 3168.456 1599.57 3 243 246.3473 242.8827 242.9952 51

4 24 235.182 24.167 24.4 5 33 38.59 39.88948 32.99993 6 43 25.58 11.68468 42.99999 7 8 9.77 6.16686 7.999989 8 175 629.8 175.621 175 9 423 333.97 197.3923 423 1 321 275.49 13.33255 32.9998 11 218 31.4 217.8293 218.17 12 21 199.7 2.765 2.9998 13 79 113.59 82.28573 78.99933 14 6 5.15 59.5612 6 15 61 43.63 56.88275 61 16 4 58.23 41.55418 39.99999 17 9 25.36 41.71533 9.1 18 114 929.53 11384.8 11398.87 19 66 9621.77 6599.16 662.744 2 64 5723.68 718.591 6399.517 21 2455 1852.78 2454.785 2454.851 22 724 811.37 136.327 724.2 23 539 454.28 538.881 539.6 24 453 31.81 1.7177 452.9994 25 523 265.57 1214.319 522.9988 26 387 285.899 387.3988 387 27 88 7.9 88.77245 87.99987 28 98 132.92 96.47764 98.3 29 7.3 7.69 15.74339 7.299736 3 5.9 6.411 2.11236 5.9379 31 163 1215.16 163.154 163 32 72 1362.37 1129.184 71.9819 33 65 55.68 64.7895 65.3 34 23 17.2 73.82972 23 35 82 124.49 3.58422 82.1 36 55 39.74 7.26457 55 37 47 87.7 29.24169 46.99995 38 12 19.71 7.28678 12 39 8 6.2 66.4877 8.34 4 8 9.78 8.41984 8 41 6 5.21 6.21124 8.46612 52

42 45 19.29 234.8325 195.2396 43 83 13.73 11.74 228.257 44 87 132.87 1.6351 13.721 45 16 19.2 157.2179 3.31 46 126 213.91 122.6887 343.28 47 36 32.77 7.26629 57.82236 48 1272 224.63 6.364794 738.6743 49 156 141.51 155.7227 335.579 5 176 162.46 491.2995 188.5651 51 122 82.74 254.6255 93.75488 52 41 36.46 48.5263 51.3936 53 14 22.41 38.53126 14.7524 54 2 11.78 6.37142 34.6563 55 18 7.51 8.634863 16.71238 56 958 388.88 957.3443 385.3861 57 237 277.35 238.535 177.1851 58 13 145.19 154.691 282.375 59 7 82.78 6.243794 85.83885 6 57 5.11 132.3261 119.6359 61 5 47.26 6.3985 4.99599 62 38 41.18 38.24981 14.7745 63 15 17.13 6.164915 19.69363 Finally, Figure 6, 7 & 8 shows the scatter plot of Actual Effort versus Estimated Effort using ANFIS, ANN and COCOMO models. The figures show that the model performance is generally precise in case of ANFIS, where all data points follow a linear trend line and the model using ANFIS is better than ANN and COCOMO. 15 1 5 Actual Effort Estimated Effort using COCOM O Estimated Effort using ANN Estimated Effort using ANFIS 1 7 13 19 25 31 37 43 49 55 61 Figure 5. Comparative plot of Actual Effort, COCOMO, ANN and ANFIS Output 53

Using ANFIS Estimated Effort 15 1 5 5 1 15 Actual Effort Figure 6. Scatter Plot of Actual vs. Estimated Effort using ANFIS Using ANN Estimated Effort 15 1 5 2 4 6 8 1 12 Actual Effort Figure 7. Scatter Plot of Actual vs. Estimated Effort using ANN 54

Using COCOMO Estimated Effort 15 1 5 5 1 15 Actual Effort 5. CONCLUSION Figure 8. Scatter Plot of Actual vs. Estimated Effort using COCOMO Here, in the present paper, applicability and capability of ANFIS techniques for effort estimation prediction has been investigated. It is seen that ANFIS models are very robust, characterized by fast computation, capable of handling the noisy and approximate data that are typical of data used here for the present study. Due to the presence of non-linearity in the data, it is an efficient quantitative tool to predict effort estimation. The studies have been carried out using MATLAB simulation environment. In all sixteen input variable were used, consisting of fifteen Effort Adjustment Factors and size of the project and one output variable as Effort. Here the initial parameters of the ANFIS are identified using the subtractive clustering method. Gaussian membership functions (given in earlier section) are used for each fuzzy set in the fuzzy system. Subtractive clustering algorithm has been used to determine the number of membership functions and fuzzy rules required for ANFIS development. Here hybrid learning algorithm has been used to determine the parameters of the Gaussian membership function. Each ANFIS has been trained for 1 epochs. From the analysis of the above results, given under heading Results and Discussions, it is seen that the Effort Estimation prediction model developed using ANFIS technique has been able to perform well over ANN and COCOMO Model. This can be concluded from the analysis of the results given in Tables 5, 6, 7 and 8. The RMSE value obtained from ANFIS model (112.638) is lower than those from ANN (353.1977) and COCOMO Model (532.2147). Further from Fig. 6, 7 & 8 and Table 8 it is seen that ANFIS model line almost closely follows the actual effort line than those of ANN and COCOMO. This again depicts the superiority of ANFIS technique over ANN and COCOMO model for effort estimation. REFERENCES [1]. Alpaydın,E. 24. Introduction to machine learning. Cambridge: MIT Press. [2]. Boehm,B., Abts, C., Chulani, S. 2. Software development cost estimation approaches: A survey. [3]. Annals of Software Engineering (1): 177 25. 55

[4]. Boehm,B.W. 1981. Software Engineering Economics. Upper Saddle River, NJ, USA: Prentice Hall PTR. [5]. Chen,D.W. And Zhang, J.P., (25), Time series prediction based on ensemble ANFIS, Proceedings of the fourth International Conference on Machine Learning and Cybernetics, IEEE, pp 3552-3556.1 [6]. Chiu,S.,(1994), Fuzzy Model Identification based on cluster estimation, Journal of Intelligent and Fuzzy Systems, 2 (3), pp 267 278.11 [7].Fuller,R.,(1995), Neural Fuzzy Systems, ISBN 951-65-624-, ISSN 358-5654.17 [8]. Fuzzy Logic Toolbox, MATLAB version R213a. [9]. Hammouda, K. A., Comparative Study of Data Clustering Techniques. [1]. Jang,J-S.R.,(1992), Neuro-Fuzzy Modelling: Architecture, Analyses and Applications, P.hd. Thesis. [11]. Jang,J-S.R.,(1993), ANFIS-Adaptive-Network Based Fuzzy Inference System, IEEE Transactions on Systems, Man and Cybernetics, 23(3), pp 665-685. [12]. Jang, J-S. R., SUN, C.-T., (1995), Neuro-fuzzy modelling and control, Proceedings IEEE,. 83 (3), pp 378 46. [13]. Jantzen,J.,(1998), Neurofuzzy Modelling. Technical Report no. 98-H-874(nfmod), Department of Automation. Technical University of Denmark.1-28. [14]. Pendharkar, Parag C., et. al., (25), A Probabilistic Model for Predicting Software Development Effort, IEEE Transactions On Software Engineering, Vol. 31, NO. 7. [15]. Priyono, A. Ridwan, M., et. al. (25), Generation of fuzzy rules with subtractive clustering, Journal Teknologi., 43(D), pp 143-153. [16]. Sayyad Shirabad, J. and Menzies, T.J. (25) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available: http://promise.site.uottawa.ca/serepository [17]. Tagaki, T. And Sugeno, M., (1983), Derivation of fuzzy control rules from human operators control actions, Proc. IFAC Symp. Fuzzy Inform, Knowledge Representation and Decision Analysis, pp 55-6. [18]. Vaidehi, V., Monica, S., Mohammad Sheikh Safeer, S.,Deepika, M. And Sangeetha, S., (28), A Prediction System Based on Fuzzy Logic, Proceedings of World Congress on Engineering and Computer Science. 38 [19]. Zadeh, L.A., 1965), Fuzzy sets, Information and Control, 8, pp 338 353.36. Authors Sheenu Rizvi, Assistant Professor, Amity School of Engineering and Technology Lucknow, India. He received his M.Tech degree in Information Technology in 25 and Persuing Ph.D in Computer Application from Integral University. Syed Qamar Abbas completed his Master of Science (MS) from BITS Pilani. His PhD was on computer-oriented study on Queueing models. He has more than 2 years of teaching and research experience in the field of Computer Science and Information Technology. Currently, he is Director of Ambalika Institute of Management and Technology, Lucknow. Prof. Dr. M. Rizwan Beg is M.Tech & Ph.D in Computer Sc. & Engg. Presently he is working as Controller of Examination in Integral University Luck now, Uttar Pradesh, India He is having more than 16 years of experience which includes around 14 years of teaching experience. His area of expertise is Software Engg., Requirement Engineering, Software Quality, and Software Project Management. He has published more than 4 Research papers in International Journals & Conferences. Presently 8 research scholars are pursuing their Ph.D in his supervision. 56