The Journal of Systems and Software



Similar documents
Forecasting the Direction and Strength of Stock Market Movement

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Can Auto Liability Insurance Purchases Signal Risk Attitude?

What is Candidate Sampling

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Single and multiple stage classifiers implementing logistic discrimination

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Alternative Way to Measure Private Equity Performance

IMPACT ANALYSIS OF A CELLULAR PHONE

Calculation of Sampling Weights

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Project Networks With Mixed-Time Constraints

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Predicting Software Development Project Outcomes *

Machine Learning and Software Quality Prediction: As an Expert System

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

A DATA MINING APPLICATION IN A STUDENT DATABASE

Statistical Methods to Develop Rating Models

Software project management with GAs

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Enabling P2P One-view Multi-party Video Conferencing

Gender Classification for Real-Time Audience Analysis System

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Performance Analysis and Coding Strategy of ECOC SVMs

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHAPTER 14 MORE ABOUT REGRESSION

Design and Development of a Security Evaluation Platform Based on International Standards

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Damage detection in composite laminates using coin-tap method

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Efficient Striping Techniques for Variable Bit Rate Continuous Media File Servers æ

Sciences Shenyang, Shenyang, China.

Enterprise Master Patient Index

Logistic Regression. Steve Kroon

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

L10: Linear discriminants analysis

8 Algorithm for Binary Searching in Trees

Sample Design in TIMSS and PIRLS

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

Credit Limit Optimization (CLO) for Credit Cards

The Greedy Method. Introduction. 0/1 Knapsack Problem

STATISTICAL DATA ANALYSIS IN EXCEL

Fault tolerance in cloud technologies presented as a service

Improved SVM in Cloud Computing Information Mining

Dynamic Resource Allocation for MapReduce with Partitioning Skew

Searching for Interacting Features for Spam Filtering

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Optimal Choice of Random Variables in D-ITG Traffic Generating Tool using Evolutionary Algorithms

Overview of monitoring and evaluation

1. Measuring association using correlation and regression

Lecture 2: Single Layer Perceptrons Kevin Swingler

A GENETIC ALGORITHM-BASED METHOD FOR CREATING IMPARTIAL WORK SCHEDULES FOR NURSES

Construction Rules for Morningstar Canada Target Dividend Index SM

A Secure Password-Authenticated Key Agreement Using Smart Cards

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST)

Calculating the high frequency transmission line parameters of power cables

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

Product Quality and Safety Incident Information Tracking Based on Web

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

An MILP model for planning of batch plants operating in a campaign-mode

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

Fast Fuzzy Clustering of Web Page Collections

Mooring Pattern Optimization using Genetic Algorithms

J. Parallel Distrib. Comput.

Support vector domain description

Preventive Maintenance and Replacement Scheduling: Models and Algorithms

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Transcription:

The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton and feature weghtng for analogy based software cost estmaton Y.F. L *, M. Xe, T.N. Goh Department of Industral and Systems Engneerng, Natonal Unversty of Sngapore, Sngapore 119 260, Sngapore a r t c l e n f o a b s t r a c t Artcle hstory: Receved 14 May 2007 Receved n revsed form 4 June 2008 Accepted 4 June 2008 Avalable onlne 17 June 2008 Keywords: Software cost estmaton Analogy based estmaton Feature weghtng Project selecton Genetc algorthm Artfcal datasets A number of software cost estmaton methods have been presented n lterature over the past decades. Analogy based estmaton (ABE), whch s essentally a case based reasonng (CBR) approach, s one of the most popular technques. In order to mprove the performance of ABE, many prevous studes proposed effectve approaches to optmze the weghts of the project features (feature weghtng) n ts smlarty functon. However, ABE s stll crtczed for the low predcton accuracy, the large memory requrement, and the expensve computaton cost. To allevate these drawbacks, n ths paper we propose the project selecton technque for ABE (PSABE) whch reduces the whole project base nto a small subset that consst only of representatve projects. Moreover, PSABE s combned wth the feature weghtng to form FWPS- ABE for a further mprovement of ABE. The proposed methods are valdated on four datasets (two realworld sets and two artfcal sets) and compared wth conventonal ABE, feature weghted ABE (FWABE), and machne learnng methods. The promsng results ndcate that project selecton technque could sgnfcantly mprove analogy based models for software cost estmaton. Ó 2008 Elsever Inc. All rghts reserved. 1. Introducton * Correspondng author. Tel.: +65 83442816. E-mal address: lyanfu@nus.edu.sg (Y.F. L). Software cost estmaton s crtcal for the success of software project management. It affects almost management actvtes ncludng resource allocaton, project bddng, and project plannng (Pendharkar et al., 2005; Auer et al., 2006; Jorgensen and Shepperd, 2007). The mportance of accurate estmaton has led to extensve research efforts to software cost estmaton methods. From a comprehensve revew (Boehm et al., 2000), these methods could be classfed nto the followng sx categores: parametrc models ncludng COCOMO (Boehm, 1981; Huang et al., 2007), SLIM (Putnam and Myers, 1992), and SEER-SEM (Jensen, 1983); expert judgment ncludng Delph technque (Helmer, 1966) and work breakdown structure based methods (Tausworthe, 1980; Jorgensen, 2004); learnng orented technques ncludng machne learnng methods (Heat, 2002; Shn and Goel, 2000; Olvera, 2006) and analogy based estmaton (Shepperd and Schofeld, 1997; Auer et al., 2006; Huang and Chu, 2006); regresson based methods ncludng ordnary least square regresson (Mendes et al., 2005; Costaglola et al., 2005) and robust regresson (Myazak et al., 1994); dynamcs based models (Madachy, 1994); composte methods (Chulan et al., 1999; MacDonell and Shepperd, 2003). The analogy based estmaton (ABE) whch s essentally a case-based reasonng (CBR) approach (Shepperd and Schofeld, 1997) was frst proposed by Sternberg (1977). Due to ts conceptual smplcty and emprcal compettveness, ABE has been extensvely studed and appled (Shepperd and Schofeld, 1997; Walkerden and Jeffery, 1999; Angels and Stamelos, 2000; Mendes et al., 2003; Auer et al., 2006; Huang and Chu, 2006; Chu and Huang, 2007). The basc dea of ABE s smple: when provded a new project for estmaton, compare t wth hstorcal projects to retreve the most smlar projects whch are then used to predct the cost of new project. Generally, the ABE (or CBR) conssts of four parts: a hstorcal project dataset, a smlarty functon, a soluton functon and the assocated retreval rules (Kolodner, 1993). One of the assocated central parts n ABE s the smlarty functon, whch measures the level of smlarty between two dfferent projects. Snce each project feature (or cost drver) has one poston n the smlarty functon and therefore largely determnes whch hstorcal projects should be retreved for fnal predcton, there are several approaches focusng on searchng the approprate weght of each feature, such as Shepperd and Schofeld (1997), Walkerden and Jeffery (1999), Angels and Stamelos (2000), Mendes et al. (2003), Auer et al. (2006), Huang and Chu (2006). However, some dffcultes are stll confronted by ABE methods. Such as the non-normal characterstcs (ncludes skewness, heteroscedastcty and excessve outlers) of the software engneerng datasets (Pckard et al., 2001) and the ncreasng szes of the datasets (Shepperd and Kadoda, 2001). The large and non-normal datasets always lead ABE methods to low predcton accuracy and hgh computatonal expense (Huang et al., 2002). To allevate these drawbacks, many research works n the CBR lterature (Lpowezky, 0164-1212/$ - see front matter Ó 2008 Elsever Inc. All rghts reserved. do:10.1016/j.jss.2008.06.001

242 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 1998; Babu and Murty, 2001; Huang et al., 2002) have been devoted to the case selecton technque. The objectve of case selecton (CS) s to dentfy and remove redundant and nosy projects. By reducng the whole project base nto a smaller subset that consst only of representatve projects, the CS could save the computng tme searchng for most smlar projects and produce qualty predcton results. Moreover, the smultaneous optmzaton of feature weghtng and case selecton n CBR has been nvestgated n several studes (Kuncheva and Jan, 1999; Rozsypal and Kubat, 2003; Ahn et al., 2006) and sgnfcant mprovements are reported from these studes. From the dscusson above, t s worthwhle to nvestgate case selecton technque n the context of analogy based software cost estmaton. In ths study, we propose genetc algorthm for project selecton for ABE (PSABE) and the smultaneous optmzaton of feature weghts and project selecton for ABE (FWPSABE). The proposed two technques are compared aganst the feature weghtng ABE (ABE), the conventonal ABE and other popular cost estmaton methods ncludng ANN, RBF, SVM and CART. For the consstency of termnology, n rest of ths paper we refer the case selecton as project selecton for ABE. To compare dfferent estmaton methods, the emprcal valdaton s very crucal. Ths has led to many studes use varous real datasets to conduct comparsons of dfferent cost estmaton methods. However most publshed real datasets are relatvely small (Mar et al., 2005) and the small real dataset could be problematc f we would lke to show the sgnfcant dfferences between the estmaton methods. Another drawback of the real world datasets s that the true propertes of them may not be fully known. The artfcally generated datasets (Pckard et al., 2001; Shepperd and Kadoda, 2001; Foss et al., 2003; Myrtvet et al., 2005) wth known characterstcs provde a feasble way to the above problems. Thus, we generate two artfcal datasets and select two well known realworld datasets for controlled experments. The rest of ths paper s organzed as follows: Secton 2 presents a bref overvew on the conventonal ABE method. In Secton 3, the general framework of feature weght and project selecton system for ABE s descrbed. Secton 4 presents the real world datasets and the experments desgn. In Secton 5, the results on two real world data sets are summarzed and analyzed. In Secton 6, two artfcal datasets are generated, experments are conducted on these two datasets, and results are summarzed and analyzed. The fnal secton presents the concluson, and future works. 2. Overvew on analogy based cost estmaton Analogy based method s a pure form of case based reasonng (CBR) wth no expert used. Generally, ABE model comprses of four components: a hstorcal dataset, a smlarty functon, a soluton functon and the assocated retreval rules (Kolodner, 1993). The ABE system process also conssts of four stages: 1. Collect the past projects nformaton and prepare the hstorcal dataset. 2. Select new project s relevant features such as functon ponts (FP) and lnes of source code (LOC), whch are also collected for past projects. 3. Retreval the past projects, estmate the smlartes between new project and the past projects, and fnd the most smlar past projects. The commonly used smlartes are functons of weghted Eucldean dstance and the weghted Manhattan dstance. 4. Predct the cost of the new project from the chosen analogues by the soluton functon. Generally the un-weghted average s used as soluton functon. The hstorcal dataset whch keeps all nformaton of past projects s a key component n ABE system. However, t often contans nosy or redundant projects. By reducng the whole hstorcal dataset nto a smaller but more representatve subset, the project selecton technque postvely affects the conventonal ABE systems. Frst, t reduces the search space, thus more computng resources searchng for most smlar projects are saved. Secondly, t also produces qualty predctons because t may elmnate nose n the hstorcal dataset. In the followng sectons, other components of ABE system ncludng smlar functon, the number of most smlar projects, and soluton functon are presented. 2.1. Smlarty functon The smlarty functon measures the level of smlarty between projects. Among dfferent types of smlarty functons, eucldean smlarty (ES) and manhattan smlarty (MS) based smlartes are wdely accepted (ES: Shepperd and Schofeld, 1997. MS: Chu and Huang, 2007). The Eucldean smlarty s based on the Eucldean dstance between two projects: 2vffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 3 ux n Smðp; p0þ ¼ 1= 4t w Dsðf ; f 0 Þ þ d5 d ¼ 0:0001 ¼1 8 0 >< ðf f Þ2 ; f f and f 0 are numerc or ordnal Dsðf ; f 0 Þ¼ 1 f f and f 0 are nomnal and f ¼ f 0 >: 0 f f and f 0 are nomnal and f f 0 ð1þ where p and p 0 denote the projects, f and f 0 denote the th feature value of ther correspondng projects, w = [0,1] s the weght of the th feature, d = 0.0001 s a small constant to prevent the stuaton the denomnator equals 0, and n s the total number of features. The Manhattan smlarty s based on the Manhattan dstance whch s the sum of the absolute dstances for each par of features " # Smðp; p0þ ¼ 1= Xn w Dsðf ; f 0 Þþd d ¼ 0:0001 ¼1 8 0 >< jf f j f f and f 0 are numerc or ordnal Dsðf ; f 0 Þ¼ 1 f f and f 0 are nomnal and f ¼ f 0 >: 0 f f and f 0 are nomnal and f f 0 ð2þ An mportant ssue n the smlarty functons s how to assgn approprate weght w to each feature par, because each feature may have dfferent relevance to the project cost. In the lterature, several approaches were focusng on ths topc: Shepperd and Schofeld (1997) set each weght to be ether 1 or 0 then apply a brute-force approach choosng optmal weghts; Auer et al. (2006) extent Shepperd and Schofeld s approach to the flexble extensve search method. Walkerden and Jeffery (1999) use human judgment to determne the feature weghts; Angels and Stamelos (2000) choose a value generated from statstcal analyss as the feature weghts. More recently, Huang and Chu (2006) propose the genetc algorthm to optmze feature weghts. 2.2. K number of smlar projects Ths parameter refers to the K number of most smlar projects that s close to the project beng estmated. Some studes suggested K = 1 (Walkerden and Jeffery, 1999; Auer et al., 2006; Chu and Huang, 2007). However, we sets K = {1,2,3,4,5} snce many studes recommend K equals to two or three (Shepperd and Schofeld, 1997; Mendes et al., 2003; Jorgensen et al., 2003; Huang and Chu,

Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 243 2006) and K = {1, 2, 3, 4, 5} could cover most of the suggested numbers. 2.3. Soluton functons After K most smlar projects are selected, the fnal predcton for the new project s determned by computng certan statstc based on the selected projects. The soluton functons used n ths study are: the closet analogy (most smlar project) (Walkerden and Jeffery, 1999), the mean of most smlar projects (Shepperd and Schofeld, 1997), the medan of most smlar projects (Angels and Stamelos, 2000) and the nverse dstance weghted mean (Kadoda et al., 2000). The mean s the average of the costs of K most smlar projects, where K > 1. It s a classcal measure of central tendency and treats all most smlar projects as beng equally nfluental on the cost estmates. The medan s the medan of the costs of K most smlar projects, where K > 2. It s another measure of central tendency and a more robust statstc when the number of most smlar projects ncreases (Angels and Stamelos, 2000). The nverse dstance weghted mean (Kadoda et al., 2000) allows more smlar to have more nfluence than less smlar ones. The formula for weghed mean s shown n (3): bc p ¼ XK P n k¼1 ¼1 Smðp; p k Þ Smðp; p k Þ C pk where p denotes the new project beng estmated, p k represents the kth most smlar project, Sm(p, p k ) s the smlarty between project p k and p, C pk s the cost value of the kth most smlar project p k, and K s the total number of most smlar projects. 3. Project selecton and feature weghtng In ths secton, we construct the FWPSABE system (stands for feature weghtng and project selecton analogy based estmaton) whch can perform feature weghtng analogy based estmaton (FWABE) alone, project selecton analogy based estmaton (PSABE) alone, and the smultaneous optmzaton of feature weghts and project selecton (FWPSABE). Genetc algorthm (Holland, 1975) s selected as the optmzaton tool for the FWPSABE system, snce t s a robust global optmzaton technque and has been appled to optmze the model parameters by several cost estmaton papers (Dolado, 2000; Shukla, 2000; Dolado, 2001; Huang and Chu, 2006). The framework and detaled descrpton of FWPSABE system are presented n Secton 3.2. In order to ntroduce the ftness functon n GA operaton, performance metrcs for model accuracy are frstly presented n Secton 3.1. 3.1. Performance metrcs To measure the accuraces of cost estmaton methods, three wdely used performance metrcs are consdered: Mean magntude of relatve error (MMRE), medan magntude of relatve error (MdMRE) and PRED (0.25). The MMRE s defned as MMRE ¼ 1 n Xn ¼1 MRE ¼ C C b C MRE where n denotes the number of projects, C denotes the actual cost of the th project, and b C denotes the estmated cost of the th project. Small MMRE value ndcates low level of estmaton error. ð3þ ð4þ However, ths metrc s unbalanced and penalzes overestmaton more than underestmaton. The MdMRE s the medan of all the MREs. MdMRE ¼ medanðmreþ It exhbts a smlar pattern to MMRE but t s more lkely to select the true model especally n the underestmaton cases snce t s less senstve to extreme outlers (Foss et al., 2003). The PRED (0.25) s the percentage of predctons that fall wthn 25% of the actual cost. PREDðqÞ ¼ k n where n denotes the total number of projects and k represents the number of projects whose MRE s less than or equal to q. Normally, q s set to be 0.25. The PRED (0.25) dentfes the cost estmatons that are generally accurate, whle MMRE s a based and not always relable as a performance metrc. However, MMRE has been the de facto standard n the software cost estmaton lterature. Thus, the MMRE s selected for the ftness functon n GA. More specfcally, for each chromosome generated n GA, MMRE s computed across the tranng dataset. Then GA searches through the parameters space to mnmze MMRE. 3.2. GA for project selecton and feature weghtng The procedure of the project selecton and feature weghtng va genetc algorthm s presented n ths secton. The system conssts of two stages: the frst one s the tranng stage (as shown n Fg. 2) and the second s the testng stage (as shown n Fg. 3). In the tranng stage, a set of tranng projects are presented to the system, the ABE model s confgured by the canddate parameters (feature weghts and selecton codes) to produce the cost predctons, and GA explores the parameters space to mnmze the error (n terms of MMRE) of ABE on the tranng projects by the followng steps:. Encodng. To apply GA for optmzaton, the canddate parameters are coded as a bnary code chromosome. As shown n Fg. 1, each ndvdual chromosome conssts of two parts. The frst part s the codes for feature weghts wth the length of 14 n, where n s the number of features. Snce the feature weghts n ABE model are decmal numbers, the bnary codes have to be transformed nto decmal values before enterng ABE model. As many authors (Mchalewcz, 1996; Ahn et al., 2006) suggested, the features weghts s set as precsely as 1/10,000. Thus, 14 bnary bts are requred to express ths precson level because 8192 = 2 13 < 10,000 6 2 14 = 16,384. After transformaton, all decmal weght values are normalzed nto the nterval [0,1] by the followng formula (Mchalewcz, 1996): w ¼ w0 2 14 1 ¼ w0 16; 383 where w 0 s the decmal converson of th feature s bnary weght. For example, the bnary code for feature 1 of the sample chromosome n Fg. 1 s (10000000000001) 2. Its decmal value s (8193) 10 and ts normalzed value s 8193/ 16,383 0.5001.The second part of the codes s for project selecton. The value of each bt s set to be ether 0 or 1: 0 means the correspondng project n not selected and 1 means t s selected. The length of frst part s m, and m s the total number of projects n the hstorcal project base.. Populaton generaton. After the encodng of the ndvdual chromosome, the algorthm generates a populaton of chromosomes. For GA process, larger populaton sze often results n hgher chance for ð5þ ð6þ ð7þ

244 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 Feature Weghtng Project Selecton Feature 1 Feature 2 Feature n Projects Sample Chromosome 1 2 3 14 1 2 3 14 1 2 3 14 1 2 3 m 1 0 0 1 0 1 1 1 0 0 1 1 1 1 1 0 Fg. 1. Chromosome for FWPSABE. Randomly generated nput Tranng projects Canddate parameters Smlarty functon Feature weghtng Project selecton Project retreval Hstorcal PB Soluton functon Reduced PB Predcton value Genetc Operatons Selecton/Crossover/Mutaton Termnate? Yes No Canddate parameters Optmal parameters Output to next stage Fg. 2. The tranng stage of FWPSABE. good soluton (Doval et al., 1999). Snce GA s computatonally expensve, a trade-off between the convergence tme and the populaton sze must be made. In general, the mnmum effectve populaton sze grows wth problem sze. Based on prevous works (Huang and Chu, 2006; Chu and Huang, 2007), the sze of the populaton s set to be 10V where V s the total number of nput varables of GA search, whch partally reflects the problem sze.. Ftness functon. Each ndvdual chromosome s evaluated by the ftness functon n GA. As mentoned n Secton 3.1 MMRE s chosen for the ftness functon and GA s desgned to maxmze the ftness functon, as the sake of smplcty we set the ftness functon as the recprocal of MMRE. f ¼ 1 ð8þ MMRE

Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 245 Optmal parameters Inputs from last stage Testng projects Smlarty functon Feature weghtng Project selecton Project retreval Hstorcal PB Soluton functon Reduced PB Predcton Fg. 3. The testng stage of FWPSABE. v. Ftness evaluaton. After transformng the bnary chromosomes nto the feature weghtng and project selecton parameters (see step ), the procedures of ABE are executed as follows: Gven one tranng project, the smlartes between the tranng project and hstorcal projects are computed by applyng the feature weghts nto the smlarty functons n (1) or (2). Smultaneously, the project selecton part of the chromosome s used to generate the reduced hstorcal project bases (reduced PBs). Then, ABE uses 1 5 most smlar projects (1 NN to 5 NN) matchng to search through the reduced PB for 1 5 most smlar hstorcal projects. Fnally, the ABE model assgns a predcton value to the tranng project by adoptng dfferent soluton functons.the error metrc MMRE, PRED(0.25), and MdMRE are appled to evaluate the predcton performance on the tranng project set. Then, the recprocal of MMRE s used as the ftness value for each parameter combnaton (or chromosome). v. Selecton. The standard roulette wheel s used to select 10V chromosomes from the current populaton. v. Crossover. The selected chromosomes were consecutvely pared. The 1-pont crossover operator wth a probablty of 0.7 was used to produce new chromosomes n each par. The newly created chromosomes consttuted a new populaton. v. Mutaton. Each bt of the chromosomes n the new populaton s chosen to change ts value wth a probablty of 0.1, n a way that a bt 1 s changed to 0 and a bt 0 s changed to 1. v. Eltst strategy. Eltst strategy s used to overcome the defect of the slow convergence rate of GA. The eltst strategy retans good chromosomes and ensures they are not elmnated through the mechansm of crossover and mutaton. Under ths strategy, f the mnmum ftness value of the new populaton s smaller than that of the old populaton, then the new chromosome wth the mnmum ftness value wll be replaced wth the old chromosome wth the maxmum ftness value. x. Stoppng crtera. There are few theoretcal gudelnes for determnng when to termnate the genetc search. By followng the prevous works (Huang and Chu, 2006; Chu and Huang, 2007) on GA combnng wth ABE method, step v to step v are repeated untl the number of generatons equal to or excess 1000V trals or the best ftness value does not change n the past 100V trals. After the stoppng crtera are satsfed, the system moves to the second stage and the optmal parameters or chromosome are entered nto the ABE model for testng. In the above procedure, the populaton sze, crossover rate, mutaton rate and stoppng condton are the controllng parameters of the GA search. However, there are few theores to guld the assgn-

246 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 ments of these values (Ahn et al., 2006). Hence, we determne the value of these parameters n the lght of prevous studes that combnes ABE and GAs. Most pror studes use 10V chromosomes as the populaton sze, ther crossover rate ranges from 0.5 to 0.7, and the mutaton rate ranges from 0.06 to 0.1 (Ahn et al., 2006; Huang and Chu, 2006; Chu and Huang, 2007). However, because the search space for our GA s larger than these studes, we set the parameters to the hgher bounds of those ranges. Thus, n ths study the populaton sze s 10V, the crossover rate s set at 0.7 and the mutaton rate s 0.1. The second stage s the testng stage. In ths stage system receves the optmzed parameters from the tranng stage to confgure ABE model. The optmal ABE s then appled to the testng projects to evaluate the traned ABE. 4. Datasets and experment desgns In ths secton, two real world software engneerng datasets are frstly utlzed for emprcal evaluaton of our methods. Addtonally, all the cost estmaton methods ncluded n our experments are descrbed n Secton 4.2 and the detaled experments procedure s presented n Secton 4.3. 4.1. Dataset preparaton The Albrecht dataset (Albrecht and Gaffney, 1983) ncludes 24 projects developed by usng thrd generaton languages. 18 of the projects were wrtten n COBOL, 4 were wrtten n PL1, and 2 were wrtten n DMS languages. Sx ndependent features of ths dataset are nput count, output count, query count, fle count, functon ponts, and source lnes of code. The dependent feature person hours s recorded n 1000 h. The descrptve statstcs of all the features shown n Table 1. The Desharnas dataset (Desharnas, 1989) ncludes 81 projects and 11 features, 10 ndependent and one dependent. Snce 4 out of 81 projects contan mssng feature values, they have been excluded from the dataset. Ths process results n the 77 complete projects for our study. The ten ndependent features of ths dataset are TeamExp, ManagerExp, YearEnd, Length, Transactons, Enttes, PontsAdjust, Envergure, PontsNonAjust, and Language. The dependent feature person hours s recorded n 1000 h. The descrptve statstcs of all the features are shown n Table 2. Before the experments, all types of features are normalzed nto the nterval [0, 1] n order to elmnate ther dfferent nfluences. In addton, the two real datasets (Albrecht and Desharnas) are randomly splt nto three nearly equal szed sub-sets for tranng and testng. The detal parttons of each dataset are provded n Table 3. The hstorcal dataset s utlzed by ABE model to retreve smlar past projects. The tranng set s treated as the targets for the optmzaton of feature weghts and project subsets. The testng set s exclusvely used to evaluate the optmzed ABE models. 4.2. Cost estmaton methods Four ABE based models are ncluded n our experments. The frst model s the conventonal ABE. The second model s feature Table 1 Descrptve statstcs for Albrecht dataset Feature Mnmum Maxmum Mean Standard devaton Input count 7.00 193.00 40.25 36.91 Output count 12.00 150.00 47.25 35.17 Query count 3.00 60.00 17.38 15.52 Fle count 0.00 75.00 16.88 19.34 Functon ponts 3.00 318.00 61.08 63.68 SLOC 199.00 1902.00 647.63 488.00 Person hours 0.50 105.20 21.88 28.42 Table 2 Descrptve statstcs for Desharnas dataset Feature Mnmum Maxmum Mean Standard devaton TeamExp 0.00 4.00 2.30 1.33 ManagerExp 0.00 7.00 2.65 1.52 YearEnd 83.00 88.00 85.78 1.14 Length 1.00 36.00 11.30 6.79 Language 1.00 3.00 1.56 0.72 Transactons 9.00 886.00 177.47 146.08 Enttes 7.00 387.00 120.55 86.11 PontsAdjust 73.00 1127.00 298.01 182.26 Envergure 5.00 52.00 27.45 10.53 PontsNonAjust 62.00 1116.00 282.39 186.36 Person hours 0.55 23.94 4.83 4.19 Table 3 The partton of real datasets Dataset Sample sze of Albrecht Sample sze of Desharnas Hstorcal 8 25 Tranng 8 25 Testng 8 27 Total 24 77 weghng analogy based estmaton (FWABE) whch assgns optmal feature weghts va GA (Huang and Chu, 2006). FWABE does not nclude project selecton technque. The thrd model, project selecton analogy based estmaton (PSABE) uses GA to optmze the hstorcal project subsets. PSABE excludes of feature weghtng. The forth model s FWPSABE whch uses GA for smultaneous optmzaton of features weghtng and projects Selecton. The latter two are the proposed by our study. For a comprehensve evaluaton of the proposed models, we compare them wth other popular machne learnng methods ncludng artfcal neural network ANN (Heat, 2002), radal bass functons RBF (Shn and Goel, 2000), support vector machne regresson SVR (Olvera, 2006), and classfcaton and regresson trees CART (Pckard et al., 2001). The best varants of machne learnng methods are obtaned by tranng these methods and tunng ther parameters on the hstorcal datasets and tranng datasets presented n Secton 3.1 respectvely. In ANN model, the number of hdden layers, the number of hdden nodes and the transfer functons are three predefned parameters and they have a major mpact on the predcton performance (Martn et al., 1997). Among these parameters, one hdden layer s often recommended snce multple hdden layers may lead to an over parameterzed ANN structure. Thus, one hdden layer s utlzed n ths study. The search spaces for the number of hdden neurons and hdden layer transfer functons are set to be {1, 3, 5, 7, 9, 10} and {lnear, tan-sgmod, log-sgmod} respectvely. Durng the tranng process, the ANN models wth dfferent parameter confguratons are frstly traned on the hstorcal dataset. Then, all ANN structures are mplemented on the tranng set and the one producng the lowest MMRE value s selected for the comparsons aganst ABE models. For RBF network, the forward selecton strategy s utlzed snce forward selecton has the advantages of flexble number of hdden nodes n advance, the tractable model selecton crtera and the relatvely low computatonal expense (Orr, 1996). In ths case, the regularzaton parameter k s ntroduced. To determne k, the search space s defned as k = {10 j = 10, 9,...,0,...,10}. Smlar to ANN s tranng procedure, all RBFs wth dfferent k values are traned on hstorcal dataset and the one yeldng the lowest MMRE on tranng data s selected for comparsons. For SVR model, the common Gaussan functon K(x,y) = exp { (x y) 2 d 2 } s used as the kernel functon. The predefned parameters d, C and e, are selected from the same search space

Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 247 {10 = 10, 9,...,0,...,10}. SVR models wth all knds of parameters combnatons (10 10 10 = 1000 combnatons) are traned on the hstorcal dataset. The combnaton producng the mnmal MMRE on the tranng set s chosen for comparsons. To tran CART model, we frst use the hstorcal set to ft the model and obtan a decson tree T. The tree T then s appled to the tranng set, and returns a vector of cost values computed for the tranng projects. The cost vector s then used to prune the tree T nto a mnmzed sze. The tree wth optmal sze s adopted for comparsons. 4.3. Experment procedure For the purpose of valdatons and comparsons, the followng experments procedures are conducted: Frstly, the performances of FWPSABE are nvestgated by varyng ABE parameters other than feature weghts and project subsets. As mentoned n Secton 2, ABE has three components exclusve of hstorcal project base: smlarty functons, K number of most smlar projects, and the soluton functons. In lne wth the common settngs of these parameters, we defne the search spaces for smlarty functon as {Eucldean dstance, Manhattan dstance}, K number of smlar projects as {1,2,3,4,5}, and soluton functons as {closet analogy, mean, medan, nverse dstance weghted mean} respectvely. All knds of parameter combnatons are executed on both the tranng dataset and the testng. The best confguraton on tranng dataset s selected out for the comparsons wth other cost estmaton methods. Secondly, other ABE based methods are traned by the smlar procedure descrbed n the frst step and the best varants on tranng set are selected as the canddate for comparsons. In addton, the optmzatons of machnes learnng methods are conducted on the tranng dataset by searchng through ther parameter spaces. Thrdly, the tranng and testng results of the best varants of all estmaton methods are summarzed and compared. The experments results and analyss are presented n next secton. 5. Experment results Table 4 presents FWPSABE s results on Albrecht dataset wth dfferent parameter confguratons mentoned n Secton 2. The results show that n general Eucldean dstance acheves slghtly more accurate performances than Manhattan dstance on both the tranng and testng dataset. As to the soluton functon, there s no clear observaton whch functon s most preferable. The choce of K value has some nfluence on the accuraces. The smaller errors mostly appear when K = 3 and K = 4. Among all confguratons, the settng {Eucldean smlarty, K = 4, and mean soluton functon} produces best results on tranng dataset and so t s selected for the comparsons wth other cost estmaton methods. Table 5 summarzes the results of the best varants of all cost estmaton methods on Albrecht dataset. It s observed that the FWPSABE acheves the best testng performance (0.30 for MMRE, 0.63 for PRED(0.25) and 0.27 for MdMRE) among all methods, and followed by PSABE, and FWABE. For a better llustraton, the correspondng testng performs are presented n Fg. 4. The results of FWPSABE wth dfferent confguratons on Desharnas dataset are summarzed n Table 6. The results show that on ths dataset the choce of dfferent smlarty functons has lttle nfluence on both the tranng and testng performances. As to the soluton functons, there s no clear concluson whch soluton functon s the best. The choce of K value has slght nfluence on the accuraces. The smaller errors are acheved by settng K = 3. In all confguratons, the settng {Eucldean smlarty, K = 3, and mean soluton functon} produces best results on tranng dataset Table 4 Results of FWPSABE on Albrecht dataset Smlarty K value Soluton Tranng Testng MMRE PRED(0.25) MdMRE MMRE PRED(0.25) MdMRE Eucldean K =1 CA 0.39 0.25 0.35 0.40 0.38 0.45 K =2 Mean 0.37 0.54 0.34 0.55 0.13 0.58 IWM 0.40 0.58 0.34 0.57 0.32 0.42 K =3 Mean 0.56 0.38 0.34 0.41 0.33 0.39 IWM 0.55 0.42 0.32 0.42 0.42 0.29 Medan 0.55 0.38 0.33 0.38 0.46 0.32 K =4 Mean 0.31 0.54 0.32 0.30 0.63 0.27 IWM 0.35 0.52 0.33 0.44 0.50 0.32 Medan 0.40 0.54 0.37 0.37 0.58 0.28 K =5 Mean 0.58 0.42 0.32 0.39 0.38 0.45 IWM 0.54 0.33 0.38 0.51 0.25 0.42 Medan 0.51 0.38 0.45 0.42 0.25 0.45 Manhattan K =1 CA 0.50 0.25 0.41 0.45 0.25 0.53 K =2 Mean 0.56 0.38 0.42 0.43 0.13 0.44 IWM 0.55 0.40 0.44 0.59 0.28 0.45 K =3 Mean 0.55 0.52 0.45 0.39 0.38 0.35 IWM 0.51 0.44 0.42 0.42 0.25 0.40 Medan 0.53 0.32 0.43 0.51 0.33 0.32 K =4 Mean 0.53 0.38 0.32 0.41 0.54 0.45 IWM 0.51 0.36 0.35 0.51 0.50 0.42 Medan 0.50 0.34 0.43 0.44 0.53 0.32 K =5 Mean 0.54 0.34 0.42 0.59 0.13 0.58 IWM 0.52 0.36 0.48 0.52 0.23 0.48 Medan 0.53 0.34 0.45 0.51 0.13 0.46

248 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 Table 5 The results and comparsons on Albrecht dataset Models MMRE PRED(0.25) MdMRE Tranng Testng Tranng Testng Tranng Testng ABE 0.38 0.49 0.50 0.13 0.36 0.49 FWABE 0.48 0.42 0.38 0.25 0.34 0.46 PSABE 0.40 0.39 0.25 0.38 0.35 0.45 FWPSABE 0.31 0.30 0.54 0.63 0.32 0.27 SVR 0.46 0.45 0.50 0.25 0.22 0.43 ANN 0.39 0.49 0.38 0.25 0.35 0.51 RBF 0.79 0.49 0.50 0.25 0.25 0.39 CART 4.77 1.70 0.13 0.13 0.58 0.89 Fg. 4. The testng results on Albrecht dataset. and so t s selected for the comparsons aganst other cost estmaton methods. Table 7 presents the results of the best varants of all cost estmaton methods on Desharnas dataset. It s shown that the FWPSABE acheves the best testng performance (0.32 for MMRE, 0.44 for PRED(0.25) and 0.29 for MdMRE), and followed by SVR and PSABE. Fg. 5 provdes an llustratve verson of the testng results n Table 7. 6. Artfcal datasets and experments results To compare dfferent cost estmaton methods, the need for emprcal valdaton s very crucal. Ths has led to the collecton of varous real world datasets for experments. Mar et al. (2005) conducted an extensve survey of the real datasets for cost estmaton from 1980 onwards. As reported, most publshed real world datasets are relatvely small for the tests of sgnfcance and the true propertes of them may not be fully known. For example, t mght be dffcult to dstngush dfferent types of dstrbuton n the presence of extreme outlers n a small dataset (Shepperd and Kadoda, 2001). Artfcally generated datasets provde a feasble soluton to the above two dffcultes. Frstly, the researchers can generate reasonable amount of artfcal data to nvestgate the sgnfcant dfferences among the competng technques. Secondly, t provdes the control over the characterstcs of the artfcal dataset. Especally, researchers could desgn a systematc way to vary the propertes for ther research purposes (Pckard et al., 1999). In order to evaluate the proposed methods n a more controlled way, we generate two artfcal datasets for further experments. From each of the two real datasets, we extract a set of characterstcs descrbng ts property, or more specfcally ts non-normalty. The non-normalty consdered n our study ncludes Table 6 Results of FWPSABE on Desharnas dataset Smlarty K value Soluton Tranng Testng MMRE PRED(0.25) MdMRE MMRE PRED(0.25) MdMRE Eucldean K =1 CA 0.54 0.24 0.47 0.52 0.27 0.51 K =2 Mean 0.57 0.26 0.45 0.62 0.37 0.50 IWM 0.55 0.24 0.44 0.97 0.42 0.67 K =3 Mean 0.40 0.36 0.36 0.32 0.44 0.29 IWM 0.55 0.36 0.38 0.42 0.42 0.36 Medan 0.56 0.34 0.36 0.38 0.42 0.34 K =4 Mean 0.59 0.16 0.39 0.40 0.26 0.39 IWM 0.55 0.36 0.41 0.64 0.17 0.46 Medan 0.53 0.34 0.37 0.57 0.38 0.42 K =5 Mean 0.55 0.24 0.56 0.43 0.28 0.48 IWM 0.54 0.26 0.56 0.52 0.25 0.42 Medan 0.59 0.29 0.55 0.64 0.27 0.53 Manhattan K =1 CA 0.39 0.28 0.37 0.67 0.30 0.44 K =2 Mean 0.54 0.32 0.48 0.47 0.25 0.51 IWM 0.55 0.40 0.34 0.52 0.25 0.53 K =3 Mean 0.45 0.28 0.49 0.46 0.22 0.38 IWM 0.56 0.24 0.43 0.41 0.42 0.37 Medan 0.58 0.20 0.46 0.51 0.20 0.45 K =4 Mean 0.51 0.24 0.48 0.57 0.33 0.51 IWM 0.53 0.26 0.55 0.58 0.27 0.52 Medan 0.60 0.30 0.53 0.54 0.28 0.52 K =5 Mean 0.54 0.24 0.50 0.52 0.26 0.48 IWM 0.56 0.34 0.58 0.64 0.18 0.59 Medan 0.63 0.36 0.55 0.55 0.23 0.52

Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 249 Table 7 The results and comparsons on Desharnas dataset Models MMRE PRED(0.25) MdMRE Tranng Testng Tranng Testng Tranng Testng ABE 0.62 0.62 0.28 0.22 0.51 0.50 FWABE 0.51 0.46 0.12 0.22 0.48 0.39 PSABE 0.39 0.41 0.28 0.30 0.37 0.38 FWPSABE 0.40 0.32 0.36 0.44 0.36 0.29 SVR 0.42 0.40 0.28 0.37 0.45 0.37 ANN 0.45 0.57 0.36 0.22 0.44 0.43 RBF 0.57 0.42 0.24 0.37 0.49 0.29 CART 0.97 0.52 0.28 0.30 0.50 0.35 cost, hours 12 x 104 10 8 6 4 2 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 sze, nonadjusted functon ponts Fg. 6. Cost versus sze of Albrecht dataset. 2.5 x 104 2 Fg. 5. The testng results on Desharnas dataset. cost, hours 1.5 1 skeweness, varance nstablty, and excessve outlers (Pckard et al., 2001). We then by usng the two sets of characterstcs generate two sets of artfcal data. Secton 6.1 presents the detals for artfcal datasets generaton. 6.1. Generaton of the artfcal datasets 0.5 0 0 200 400 600 800 1000 1200 sze, non adjusted functon ponts To explore the non-normal characterstcs of the real world dataset, the cost-sze scatter plot for Albrecht dataset s drawn as Fg. 6. The scatter plot ndcates the slght skewness, moderate outlers, and slght varance nstablty of the Albrecht dataset. The cost-sze scatter plot of the Desharnas dataset s llustrated n Fg. 7 whch shows weak skewness, extreme outlers, and hghly varance nstablty of ths dataset. From the analyss above, software dataset often exhbts a mxture of several non-normal characterstcs such as skewness, varance nstablty, and excessve outlers (Pckard et al., 2001). These characterstcs do not always appear n the same degree. In some cases they are moderately non-normal such as the Albrecht dataset, whle n other cases they are severely non-normal such as the Desharnas dataset. Wthout loss of generalty, we adopted Pckard s way of modelng non-normalty n ths work. Other types of technques for artfcal dataset generaton are also avalable n recent lterature. For more detals, readers can refer to Shepperd and Kadoda (2001), Foss et al. (2003) and Myrtvet et al. (2005). By Pckard s way, we smulate the combnaton of non-normal characterstcs: skeweness, unstable varance and outlers n (7): y ¼ 1000 þ 6x 1 sk þ 3x 2 sk þ 2x 3 sk þ e het The ndependent varables (x 1 sk, x 2 sk, x 3 sk) are generated by Gamma dstrbuted random varables x 0 1, x0 2, and x0 3 wth mean 4 ð9þ Fg. 7. Cost versus sze of Desharnas dataset. and varance 8. And the skewness s explct by the Gamma dstrbutons. In order to vary the scale of the ndependent varables, we then multply the x 0 1 by 10 to create varable x 1sk, the x 0 2 by 3 to create varable x 2 sk and x 0 3 by 20 to create the varable x 3sk. The last term e het n the formula smulates a specal form of unstable varance: heteroscedastcty. The heteroscedastcty occurs where the error term s related to one of the varables n the model and ether ncrease or decreases dependng on the value of the ndependent varable. The error term e het s related to x 1 sk by the relatonshp e het = 0.1 e x 1 sk for the moderate heteroscedastcty, and e het =6 e x 1 sk for the severe heteroscedastcty (Pckard et al., 2001). The outlers are generated by multplyng or dvdng the dependent varable y by a constant. We select 1% of the data to be the outlers. Half of the outlers are obtaned by multplyng whle half of them are got by dvdng. For the moderate outlers, we set the constant value as 2, whle for the severe outlers, 6 s chosen to be the constant. The combnaton of moderate heteroscedastcty and moderate outlers s used to generate the moderate non-normalty dataset (Fg. 8). The jont of severe heteroscedastcty and severe outlers s used to obtan the severe non-normalty dataset (Fg. 9).

250 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 Y 4000 3500 3000 2500 2000 1500 Table 9 The results and comparsons on artfcal moderate non-normalty dataset Models MMRE PRED(0.25) MdMRE Tranng Testng Tranng Testng Tranng Testng ABE 0.068 0.116 0.98 0.94 0.048 0.093 FWABE 0.090 0.110 1.00 0.98 0.081 0.098 PSABE 0.057 0.086 1.00 0.98 0.043 0.068 FWPSABE 0.055 0.079 1.00 0.98 0.044 0.060 SVR 0.069 0.095 0.98 0.98 0.055 0.077 ANN 0.065 0.088 1.00 0.99 0.061 0.077 RBF 0.099 0.115 0.94 0.93 0.075 0.092 CART 0.099 0.109 0.98 0.95 0.074 0.090 Y 1000 500 0 50 100 150 200 250 300 350 400 X1sk 3500 3000 2500 2000 1500 Fg. 8. Y versus x 1 sk of moderate non-normalty dataset. n MMRE at 0.079 and MdMRE at 0.06 and the second best value 0.98 for PRED(0.25), whle ANN gets the hghest PRED(0.25) value at 0.99. Compare the predcton error curves n Fg. 4 for Albrecht dataset to the error curves n Fg. 10 for moderate non-normalty set, t s observed that all the methods acheve much better performance on the artfcal dataset and the dfferences among the canddate methods are much smaller on the artfcal dataset. These fndngs mply that estmaton methods n our study may converge to good predcton results on the moderately non-normal dataset wth large sze and FWPSABE s slghtly better than other methods as t elmnate the nose n the hstorcal dataset. Table 10 shows the results on artfcal severe non-normalty dataset. FWPSABE acheves the best performances n MMRE at 0.16 and MdMRE at 0.11 and the second best value 0.80 for PRED(0.25), whle CART obtans the hghest PRED(0.25) value at 0.81. Compare Fgs. 10, and 11, t s shown that the all methods obtan poorer performances on severe non-normal dataset. Ths 1000 500 0 0 50 100 150 200 250 300 350 X1sk Fg. 9. Y versus x 1 sk of severe non-normalty dataset. 6.2. Experments results on artfcal datasets By usng the equaton mentoned n Secton 6.1, we generate two artfcal datasets, each wth 500 projects. For a better assessment of accuracy, we make the data for testng much larger by dvdng the artfcal datasets nto: hstorcal set wth 50 projects, tranng set wth 50 projects, and the testng set wth 400 projects (see Table 8). We apply all the methods onto the two artfcal datasets by followng the same procedure presented n Secton 4.3. The results and comparsons are summarzed as followng. The results on artfcal moderate non-normalty dataset are n Table 9. It s shown that FWPSABE acheves the best performances Table 8 The partton of artfcal datasets Dataset Sample sze of artfcal moderate non-normalty data Hstorcal 50 50 Tranng 50 50 Testng 400 400 Total 500 500 Sample sze of artfcal severe non-normalty data Fg. 10. The testng results on artfcal moderate non-normalty dataset. Table 10 The results and comparsons on artfcal severe non-normalty dataset Models MMRE PRED(0.25) MdMRE Tranng Testng Tranng Testng Tranng Testng ABE 0.32 0.20 0.68 0.73 0.18 0.14 FWABE 0.34 0.19 0.72 0.77 0.14 0.13 PSABE 0.31 0.18 0.70 0.75 0.11 0.12 FWPSABE 0.30 0.15 0.74 0.80 0.14 0.10 SVR 0.34 0.18 0.62 0.76 0.19 0.12 ANN 0.34 0.17 0.70 0.79 0.16 0.12 RBF 0.37 0.18 0.66 0.80 0.18 0.13 CART 0.38 0.18 0.72 0.81 0.16 0.14

Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 251 to process the mss values the FWPSABE system starts. Furthermore, only MMRE s used for optmzaton objectve functon, and there s no guarantee that other qualty metrcs such as PRED(0.25) and MdMRE can be optmzed whle optmzng the sngle objectve MMRE. Mult-objectve optmzaton technques can be nvestgated n future works. References Fg. 11. The testng results on artfcal severe non-normalty dataset. observaton ndcates that hgh degree of non-normalty has negatve mpacts on the performance of estmaton methods n our study. 7. Conclusons and future works In ths study, we ntroduce the project selecton technque to refne the hstorcal project database n ABE model. In addton, the smultaneous optmzaton of feature weghts and project selecton (FWPSABE) s proposed to further mprove the performance of ABE. To evaluate of our methods, we apply them on two real-world dataset and two artfcal datasets. The error ndcators for methods evaluatons are MMRE, PRED(0.25), and MdMRE. The promsng results of the proposed FWPSABE system ndcate that t can sgnfcantly mprove the ABE model and enhance ABE as a successful method among software cost estmaton technques. One major concluson of ths paper s that FWPSABE system may produce more accurate predctons than other advanced machnes learnng technques for software cost estmaton. In the lterature, ABE s already regarded as a benchmarkng method for cost estmaton (Shepperd and Schofeld, 1997). Frst, t s not complex for mplementaton and t s more transparent to the users than most machne learnng methods. Moreover, ABE s predcton can update n real tme; once a project s completed, ts nformaton can be easly nserted nto the hstorcal project database. However, many studes reported that n practce ABE has been hndered by the low predcton accuracy. Accordng to the results n ths study, FWPSABE may be useful n practcal stuatons because t has the advantages of ABE and the ablty to produce more accurate cost estmaton results. However, there are stll some lmtatons of study. For example, the two real-world datasets n our experments are qute old though they have been frequently used by many recent studes. Experments on recent and large sze datasets such as ISBSG database are essental for more rgorous evaluatons on our methods. In addton, our methods are only valdated on the projects developed by the tradtonal waterfall based approach. Software projects developed by new type of approaches such as agle methods have addtonal features ndcatng the characterstcs of ther development approaches. The accuraces of FWPSABE for projects under newly development types should be further nvestgated. Moreover, ABE based methods are ntolerant of mssng features. If nformaton of some hstorcal projects s ncomplete, then the data mputaton technques should be taken Ahn, H., Km, K., Han, I., 2006. Hybrd genetc algorthms and case-based reasonng systems for customer classfcaton. Expert Systems 23 (3). Albrecht, A.J., Gaffney, J., 1983. Software functon, source lnes of code, and development effort predcton. IEEE Transactons on Software Engneerng 9 (6), 639 648. Angels, L., Stamelos, I., 2000. A smulaton tool for effcent analogy based cost estmaton. Emprcal Software Engneerng 5, 35 68. Auer, M., Trendowcz, A., Graser, B., Haunschmd, E., Bffl, S., 2006. Optmal project feature weghts n analogy-based cost estmaton: mprovement and lmtatons. IEEE Transactons on Software Engneerng 32 (2), 83 92. Babu, T.R., Murty, M.N., 2001. Comparson of genetc algorthm based prototype selecton schemes. Pattern Recognton 34, 523 525. Boehm, B., 1981. Software Engneerng Economcs. Prentce-Hall, Englewood Clffs, NJ. Boehm, B., Abts, C., Chulan, S., 2000. Software development cost estmaton approaches a survey. Annals of Software Engneerng 10, 177 205. Chu, N.H., Huang, S.J., 2007. The adjusted analogy-based software effort estmaton based on smlarty dstances. Journal of Systems and Software 80 (4), 628 640. Chulan, S., Boehm, B., Steece, B., 1999. Bayesan analyss of emprcal software engneerng cost models. IEEE Transactons on Software Engneerng 25 (4), 573 583. Costaglola, G., Ferrucc, F., Tortora, G., Vtello, G., 2005. Class pont: an approach for the sze estmaton of object-orented systems. IEEE Transactons on Software Engneerng 31 (1), 52 74. Desharnas, J.M., 1989. Analyse statstque de la productvte des projets nformatque a parte de la technque des pont des fonct on, Unversty of Montreal, Masters thess. Dolado, J.J., 2000. A valdaton of the component-based method for software sze estmaton. IEEE Transactons on Software Engneerng 26 (10), 1006 1021. Dolado, J.J., 2001. On the problem of the software cost functon. Informaton and Software Technology 43, 61 72. Doval, D., Mancords, S., Mtchell, B.S., 1999. Automatc clusterng of software systems usng a genetc algorthm. Proceedngs of the 9th Internatonal Workshop Software Technology and Engneerng Practce, 73 81. Foss, T., Stensrud, E., Ktchenham, B., Myrtvet, I., 2003. A smulaton study of the model evaluaton crteron MMRE. IEEE Transactons on Software Engneerng 29 (11). Heat, A., 2002. Comparson of artfcal neural network and regresson models for estmatng software development effort. Informaton and Software Technology 44, 911 922. Helmer, O., 1966. Socal Technology. Basc Books, NY. Holland, J., 1975. Adaptaton n Natural and Artfcal Systems. Unversty of Mchgan Press, Ann Arbor, MI, USA. Huang, S.J., Chu, N.H., 2006. Optmzaton of analogy weghts by genetc algorthm for software effort estmaton. Informaton and Software Technology 48, 1034 1045. Huang, Y.S., Chang, C.C., Sheh, J.W., Grmson, E., 2002. Prototype optmzaton for nearest-neghbor classfcaton. Pattern Recognton 35, 1237 1245. Huang, X.S., Ho, D., Ren, J., Capretz, L.F., 2007. Improvng the COCOMO model usng a neuro-fuzzy approach. Appled Soft Computng Journal 7 (1), 29 40. Jensen, R., 1983. An mproved macrolevel software development resource estmaton model. In: Proceedngs of 5th Conference of Internatonal S Parametrc Analysts, pp. 88-92. Jorgensen, M., 2004. Top-down and bottom-up expert estmaton of software development effort. Informaton and Software Technology 46, 3 16. Jorgensen, M., Indahl, U., Sjoberg, D., 2003. Software effort estmaton by analogy and regresson toward the mean. Journal of Systems and Software 68 (3), 253 262. Jorgensen, M., Shepperd, M., 2007. A systematc revew of software development cost estmaton studes. IEEE Transactons on Software Engneerng 33 (1), 33 53. Kadoda, G., Cartwrght, M., Chen, L., Shepperd, M., 2000. Experences usng casebased reasonng to predct software project effort. In: Proceedngs EASE 2000 conferences 4th Internatonal Conference on Emprcal Assessment and Evaluaton n Software Engneerng. Staffordshre, U.K. Kolodner, J.L., 1993. Case-Based Reasonng. Morgan Kaufmann Publshers Inc. Kuncheva, L.I., Jan, L.C., 1999. Nearest neghbor classfer: smultaneous edtng and feature selecton. Pattern Recognton Letters 20, 1149 1156. Lpowezky, U., 1998. Selecton of the optmal prototype subset for 1-NN classfcaton. Pattern Recognton Letters 19, 907 918. MacDonell, S.G., Shepperd, M.J., 2003. Combnng technques to optmze effort predctons n software project management. Journal of Systems and Software 66, 91 98.

252 Y.F. L et al. / The Journal of Systems and Software 82 (2009) 241 252 Madachy, R., 1994. A Software Project Dynamcs Model for Process Cost, Schedule and Rsk Assessment, Ph.D. Dssertaton, Unversty of Southern Calforna. Mar, C., Shepperd, M., Jorgensen, M. 2005. An analyss of data sets used to tran and valdate cost predcton systems. PROMISE 05. Hagan, Martn T., Demuth, Howard B., Beale, Mark H., 1997. Neural Network Desgn. PWS Publshng Co., Boston, MA. Mendes, E., Watson, I., Trggs, C., Mosley, N., Counsell, S., 2003. A comparatve study of cost estmaton models for Web hypermeda applcatons. Emprcal Software Engneerng 8, 163 196. Mendes, E., Mosley, N., Counsell, S., 2005. Investgatng Web sze metrcs for early Web cost estmaton. Journal of Systems and Software 77 (2), 157 172. Mchalewcz, Z., 1996. Genetc Algorthms + Data Structures = Evoluton Programs, thrd ed. Sprnger, Berln. Myazak, Y., Terakado, K., Ozak, K., Nozak, H., 1994. Robust regresson for developng software estmaton models. Journal of Systems and Software 27, 3 16. Myrtvet, I., Stensrud, E., Shepperd, M., 2005. Relablty and valdty n comparatve studes of software predcton models. IEEE Transactons on Software Engneerng 31 (5), 380 391. Olvera, A.L.I., 2006. Estmaton of software project effort wth support vector regresson. Neurocomputng 69, 1749 1753. Orr, M.J.L. 1996. Introducton to Radal Bass Functon Network. Techncal Reports, Centre for Cogntve Scence, Unversty of Ednburgh, 2, Buccleuch Place, Ednburgh, Scotland. Pendharkar, P.C., Subramanan, G.H., Rodger, J.A., 2005. A probablstc model for predctng software development effort. IEEE Transactons on Software Engneerng 31 (7), 615 624. Pckard, L., Ktchenham, B., Lnkman, S. 1999. An nvestgaton analyss technques for software datasets. In: Proceedng of Sxth IEEE Internatonal Software Metrcs Symposum. Pckard, L., Ktchenham, B., Lnkman, S., 2001. Usng smulated data sets to compare data analyss technques used for software cost modelng. IEE Proceedng of Software 148 (6), 165 174. Putnam, L., Myers, W., 1992. Measures for Excellence. Yourdon Press Computng Seres. Rozsypal, A., Kubat, M., 2003. Selectng representatve examples and attrbutes by a genetc algorthm. Intellgent Data Analyss 7, 291 304. Shepperd, M., Kadoda, G., 2001. Comparng software predcton technques usng smulaton. IEEE Transactons on Software Engneerng 27 (11), 1014 1022. Shepperd, M., Schofeld, C., 1997. Estmatng software project effort usng analoges. IEEE Transactons on Software Engneerng 23 (12), 733 743. Shn, M., Goel, A.L., 2000. Emprcal data modelng n software engneerng usng radal bass functons. IEEE Transactons on Software Engneerng 26 (6), 567 576. Shukla, K.K., 2000. Neuro-genetc predcton of software development effort. Informaton and Software Technology 42, 701 713. Sternberg, R., 1977. Component processes n analogcal reasonng. Psychologcal Revew 84 (4), 353 378. Tausworthe, R.C., 1980. The work breakdown structure n software project management. Journal of Systems and Software 1 (3), 181 186. Walkerden, F., Jeffery, R., 1999. An emprcal study of analogy-based software effort estmaton. Emprcal Software Engneerng 4, 135 158.