Search Based Software Engineering and Software Defect Prediction

1 Search Based Software Engineering and Software Defect Prediction Goran Mauša, mag. ing. el. University of Rijeka - Faculty of Engineering, Vukovarska 58, HR-51000 Rijeka, Croatia Email: gmausa@riteh.hr Abstract This article presents an overview of search based software engineering (SBSE) and software defect prediction areas. SBSE consists of search-based optimization algorithms that are becoming incorporated in almost every area of software engineering. However, there are still software engineering areas that exploit its benefits less often, like software defect prediction. With large software systems nowadays present in ever increasing number of human activities, the conventional testing is becoming the most expensive part of software product lifecycle. Software defect prediction is an emerging field that tends to improve software quality and testing efficiency. Despite a small amount of research done in SBSE usage in software defect prediction, there are some encouraging and motivating studies that prove there is still much research to be done in this area. Index Terms Software engineering, optimization, defect prediction I. INTRODUCTION Search based software engineering (SBSE) is used in all sorts of optimization or multi-objective problems present in software engineering. Every study that compares problem solving with and without SBSE indicates an improvement in results when using SBSE. Software defect prediction is one of software engineering fields that is merely beginning to exploit its advantages. Considering the complexity of modern software products, it is obvious that faults are inevitable. Ideally, testing should be exhaustive but out of practical reasons that is impossible. Software defect prediction can assist in testing process allowing software engineers to focus development activities on fault-prone code, improving software quality and making better use of resource. This article presents an overview of search based software engineering and software defect prediction areas. Section II describes what SBSE is and what its main characteristics are. Scope of recent research of software engineering problems that benefit from using SBSE are given in section III. An overview of software defect prediction problem is given in section IV, while section V presents some of research done in using SBSE within software defect prediction. Finally, section VI gives the conclusion. II. SEARCH BASED SOFTWARE ENGINEERING The term Search based software engineering was created by Harman and Jones in 2001. SBSE consists of search-based optimization algorithms used in software engineering, with genetic algorithms, genetic programming, simulated annealing and hill climbing being the most widely used [1]. The search space with certain parameters we can manipulate in order to make different candidate solutions and the fitness function that we can measure as output of the problem being analyzed are the two requirements when using SBSE. Each algorithm employs some degree of randomness in finding the solution to a problem. The fitness function evaluates whether the algorithm has found a better solution than the one in previous step and guides the search to as optimal as possible solution [2]. Unlike other engineering disciplines where search based algorithms found application in, software engineering is the only discipline whose artifacts are solely virtual. The mere lack of possible simulations or models which can represent and optimize its artifacts, makes SBSE and used fitness function the closest thing to an artifact. This property makes SBSE very attractive and potentially beneficial field. Search based algorithms are attractive in software engineering also due to the fact that the data in software engineering are often inaccurate, overdispersed and incomplete, making some traditional optimization techniques inappropriate. The search based approach is very generic, with different definition of fitness functions for various objectives. The fitness function, therefore, makes the same overall search based optimization strategy applicable to very different scenarios. With its convergence to an optimal or near optimal solution, SBSE becomes of great value when there is a vast number of possible combinations of candidate solutions in the search space. The candidate solutions can be a vector of numbers, a graph structure, a tree or a set of rules. In the process of optimizing the candidate solution there are several steps: Initialize the search - usually with random choice among possible candidate solutions Assess the quality of the candidate solution computing its fitness function Modify the candidate solution making it randomly slightly different Select the candidate solution based on fitness according to the chosen algorithm There are many different SBSE algorithms and each

2 of them employs a unique approach to the 4 steps mentioned earlier. However, there are certain features that differentiate algorithms in categorizes of: Local or Global optimization algorithms Single-state or Population based methods Local algorithms tend to find local optimum in the search space and may become trapped there, while global search techniques overcome that problem. However the obvious reason for using global search technique may be there is still a trade-off between efficiency and effectiveness. This means that global search techniques have higher efficiency but at the cost of greater computational effort, making local search technique more adequate for simpler problems. Single-state methods have one candidate solution at the time, while population based methods have a sample of candidate solutions (a population). The population based methods stole the concept from biology and got the name Evolutionary Algorithms. That is why their modify step involves mutation and recombination of fittest parents with tendency to create even fitter children. Examples of local single-state algorithms are hill climbing, greedy algorithm and simulated annealing, while evolutionary algorithms like non-dominated sorting genetic algorithm (NSGA) and two-archive algorithm fall into the group of global population based algorithms. SBSE can be very helpful when dealing with multiobjective problems. In such cases, there is often a necessary trade-off between different objectives and there is no optimal solution to the problem. SBSE techniques tend to find a sample of non dominated solutions which create the Pareto optimality front. Since we need to satisfy opposite objectives, the non dominated solutions are the ones which cannot be compared among each other, but at the same time we know there are no better solutions. That is why we cannot find one optimal solution, but still we get an insight into the problem and the possible, near optimal solutions. This kind of problem is often present in software engineering and SBSE provides a very good support in experts decision making process. III. SEARCH BASED SOFTWARE ENGINEERING USAGE SBSE is a widespread group of optimization techniques that found usage in almost every stage of software product lifecycle. However, there are still software engineering areas that exploit its benefits much more often than others, like software testing [3]. Here are some applications of SBSE in most recent studies: Solving the next release problem [3] [5] Automatic software repair problem [6] [11] Test data optimization problem [12] [16] Generating higher order mutants [17], [18] Other multi-objective problems in software engineering [19], [20] A. Next Release Problem The next release problem is a multi-objective problem which requirements should appear in the next release of a software product. The task of SBSE is to choose a small subset among all possible requirements in order to satisfy as many stake-holders or customers as possible and to minimize the cost at the same time. Solving the next release problem is difficult and often has so many combinations of solutions that becomes practically impossible to solve manually. Using SBSE does not give a specific answer to that problem but instead gives a range of equally good, near optimal solutions. That provides a valuable and necessary support for the decision maker. Zhang et al. [4] used search-based optimization techniques to automate the search for optimal or near optimal allocations of requirements that balance competing stake holder objectives in next release problem. NSGA- II and two-archive algorithm were compared in performance and the two-archive algorithm proved to the better one. Durillo et al. [3] analyzed which features to include in the next release of product in order to satisfy as many customers as possible with minimal cost using 3 multi-objective metaheuristics. NSGA-II was used as a reference algorithm in the field of multiobjective optimization, MOCell because it outperformed NSGA-II in several studies and PAES as one of the simplest techniques. The results showed that NSGA-II finds the highest number of optimal solutions, MOCell finds the widest range of different solutions and PAES was fastest but worst, as expected. Finkelstein et al. [5] made a proposition that each notion of fairness in next release problem should form an objective in multi-objective, pareto optimal SBSE setting. Comparing the NSGA-II and the two-archive algorithm, the results showed that they performed equally well with random data sets, while NSGA-II performed better with real data set obtained from Motorola. B. Automatic Software Repair Fixing bugs is a difficult and time-consuming manual process and some reports say software maintenance usually consumes about 90% of all costs after delivery of a product. An efficient, fully automated technique for repairing program defects could, therefore, alleviate that heavy burden. The idea of using SBSE in automatic software repair is actually rather simple but efficient. First task is to locate the region of the program relevant to an error. An SBSE algorithm is then used to produce simple changes along the path where the fault lies, trying to eliminate the fault and maintain the functionality of original program at the same time. Weimer et al. [6] used genetic programming for software repair phase in fully automated method for locating and repairing bugs in software. Fast et al. [7] used genetic programming in automated program repairing as well, but their focus was in improving fitness-function which resulted in efficiency improvement of 81%. Forrest et al.

3 [8] combined genetic programming with program analysis methods to repair bugs in the off-the-shelf legacy C programs and managed to repair all of 11 programs they used. Schulte et al. [9] explored the advantages of assembly-level repair using evolutionary computing in automated program repair problem. Having all the advantages of assembly level approach over source code level approach, they obtained nearly as efficient results as at source code level. Nguyen et al. [10] successfully used genetic programming approach to automate repairing program bugs in existing software with average running time from half a second to ten minutes. Weimer et al. [11] also successfully used genetic programming in automatic bug repair in off-the-shelf legacy C programs combining program analysis methods with evolutionary computing. Their approach requires 1428 seconds and 3903 fitness evaluations per constructing a repair on average. All of these promising results indicate SBSE being the proper choice for automated software repair. C. Test Data Optimization Test data generation, regeneration and minimization are three different approaches to test data optimization problem. They tend to identify near optimal test sets in reasonable time using Evolutionary testing, a sub-field of SBSE techniques, often called Search Based Testing. Test data generation begins from scratch, regeneration uses the pre-existing test data as a starting point and test suite minimization tends to identify and remove redundant test cases. The problem how to test something that reacts in different manner to the same test input over time can be found when dealing with autonomous agents. Nguyen et al. [12] proposed a solution to that problem and used evolutionary optimization to generate demanding test cases for such autonomous agents that produce different output to the same input, due to increasing knowledge for example. NSGA-II algorithm as a fast multi-objective genetic algorithm that in other studies proved to be better in finding widely spread solutions and with better convergence to the optimum compared to other algorithms was used in their study. Harman and McMinn [13] analyzed which type of search is best for structural test data generation problem. They used evolutionary testing as a global search algorithm, hill climbing as a local and a hybrid memetic algorithm for that purpose. Their suspicion proved to be correct and the results showed that hybrid approach, in terms of coverage, is capable of best overall performance. McMinn et al. [14] also compared evolutionary testing as a global search algorithm, hill climbing as a local search and a hybrid memetic algorithm in search based structural test data generation problem, but with the improvement of irrelevant input variable removal (INVR). The results expectedly showed that all search algorithms are more successful and cover more branches with INVR and that memetic algorithm is the most prolific technique at successfully finding significance with and without INVR. There are many testing scenarios in which a tester may already have some pre-existing test cases. They could have been created by tester, based on his experience, expertise and domain knowledge, by developer or they may be present from regression testing of previous version of product. In order to exploit the effort and knowledge put together to form this test cases, Yoo and Harman [15] proposed using pre-existing test data as a starting point in search-based test data generation, making it a regeneration process. They used hill climbing without random initialization, but with randomfirst-ascent and pre-existing test data. The results were promising, indicating the proposed approach can be up to two orders of magnitude more efficient, achieve higher structural coverage and equal component level of mutation score with much lower cost compared to a state-of-the-art search based testing technique. Test suite minimization can prove to be very important in strict time limits, often given in regression testing. Regression testing has to guarantee that the recent changes in a program do not interfere with its functionality. Due to ever growing test suite, it is prohibitively expensive to execute the entire test suite. Yoo and Harman [16] analyzed the hybrid algorithm that combines the efficient approximation of the greedy algorithm with the capability of population based genetic algorithm for pareto efficient multi-objective test suite minimization. They found greedy algorithm may provide a good approximation of pareto front in smaller software products, but for larger products, the usage of HNSGA-II is suggested due to more precise test suite minimization. D. Higher Order Mutation Generation It is said that 90% of faults which survived the testing procedure have to be complex ones. In order to reduce their number, we need to learn more about them. Higher order mutants (HOMs) are deliberately faulty programs used in software testing process. The order of mutant reflects the number of injected faults into the original program. Finding higher order mutants that create subtle and complex faults and sometimes practically mask one another helps us locating this hazardous combination of otherwise harmless faults when they exist separately. Such HOMs are more difficult to find with simple test cases and there lies a possible usage of HOMs - to find better test cases. Jia and Harman [17] compared the performance of 3 algorithms for finding optimal HOMs: greedy algorithm, genetic algorithm and hill climbing algorithm. The results showed that genetic algorithm performed best because it finds most subsuming HOMs, hill climbing always finds the highest fitness HOMs and greedy algorithm finds the highest order HOMs. What makes their findings questionable is the fact that random search found more HOMs that greedy algorithm and hill climbing. Langdon et al. [18] explored the usage of multiobjective pareto optimal approach with Monte Carlo

4 sampling, NSGA-II genetic algorithm and genetic programming to search for higher order mutants which are both harder to kill and more realistic. They found their higher order genetic programming mutation testing approach able to find even such simple faults that in combination mask each other and therefore form complex faults very difficult to detect. E. Other Multi-Objective Problems in Software Engineering Besides next release problem, there are many more problems which involve incomparable and often opposite multiple objectives in software engineering. A well known and highly important and challenging problem in software engineering is the one that requires high degree of cohesion and low degree of coupling for a good module structure. This problem is even more intensified as software evolves and its modular structure tends to degrade. Praditwong et al. [19] compared automated techniques for suggesting software clustering, delimiting boundaries between modules that maximize cohesion and minimize coupling. They used hill climbing as a single-objective algorithm and two-archive algorithm as a multi-objective with 2 different approaches: the maximizing cluster approach (MCA) and equal-size cluster approach (ECA). The results indicated that ECA is superior to MCA and hill climbing in this task. Many other multi-objective, but completely different problems in software engineering can be found in software engineering management. Project managers often do not understand the complex optimization techniques and they do not need to, in order to benefit from them. It is important to provide them with tool they can easily give input to and to obtain visually acceptable output that can help them in making important decisions. One such problem was analyzed by Di Penta et al. [20] where they used 3 metaheuristics in staff and task allocation with the objective to minimize the completion time and reduce schedule fragmentation. They compared the performance of NSGA-II, stochastic hill climbing and simulated annealing as most widely used SBSE techniques that appear in 80% publications. The results were compared in single objective optimization approach where simulated annealing was the best algorithm. IV. DEFECT PREDICTION Software defect prediction is an emerging field in software engineering that tends to improve software quality and testing efficiency. Arguments in favor of this field come mostly from great amount of time and human resources spent on locating and fixing faulty software modules. The usage of defect prediction does not offer an optimal allocation of resources. Instead, it gives valuable information to software managers which parts of software code are more likely to be fault prone and, therefore, where it would be wise to concentrate their testing resources. Fig. 1. Defect Prediction Process Figure 1 presents the stages of defect prediction process. Each of these stages can be done automatically so the whole process does not consume much resource in software product life cycle. After collecting and preparing data for defect prediction task, presented in upper row in figure 1, there are two key steps. First key step is prediction model building with learning set. The task of learning set is to present the input and the desired output to the model so the model can adjust accordingly. The second key step is to evaluate the model s predicting capabilities with testing set. The testing set allows us to see how accurately the model performs with unseen data comparing the output the model offers with the expected output. A. Data Collection Software defect prediction refers to identifying error prone software modules data mining the static code attributes. Besides static code attributes that include certain code and design measures, some researchers use additional attributes that include process and personnel details. Due to a simple and cheap procedure of extracting static code attributes, even for large systems, other attributes are used rarely [21]. There are some publicly available data sets intended for defect prediction research and the NASA PROMISE data sets are particularly popular. B. Data Preprocessing Data preprocessing is an important step in data mining. It includes procedures like feature selection among independent variables and outlier removal, which are not always included in studies. Feature selection process is usually performed using stepwise selection procedure [22], correlation analysis, grey relational analysis [23] or principal component analysis [24]. Both feature selection and outlier removal processes are used in order to improve defect prediction models performances.

5 C. Data Spliting Every defect prediction model has to be taught how to perform classification. On the other hand, it also has to be tested how well it performs with unseen data. Proper evaluation of performance can be done when we know the correct output of unseen data and that is why data sets are split into learning and testing data sets. For larger data sets the data splitting process can be traditional or crossvalidation. Traditional process usually splits the data randomly into learning and testing set in ratio 67%:23%. Improved traditional process repeats the process of randomly splitting data, usually a 100 times. Crossvalidation process, on the other hand, splits the data randomly into several groups of equal size and chooses one as testing set. Then it permutes all combinations with the same groups, each time different group assigning to testing set. If it splits the data into 3 groups, that it has the same ratio as traditional process and it is called threefold crossvalidation. Extensive testing showed that splitting the data into 10 groups (tenfold crossvalidation) leads to even better estimate of prediction model [25]. There are also data splitting processes for smaller data sets like leave-one-out or Bootstrap. Leave-one-out process performs the testing on only one sample, repeating the process for all samples. Bootstrap process randomly chooses samples into learning set, but with the possibility of repeatingly choosing the same samples. The unchosen samples are then assigned to testing set. There is also the possibility to repeat the Bootstrap process iteratively and it is usually preferred over leave-one-out. D. Prediction Model There is a great number of data mining and classification models and algorithms that are used in software defect prediction. As mentioned in previous subsections, each model uses the static code attributes as input and gives the presence of defect or number of defects as output. Most popular and state-of-the-art algorithms may be divided into statistical classifiers, nearest-neighbor methods, neural networks, support vector machine-based classifiers, decision tree-based approaches and ensemble methods [26]. Here is a more thorough list of algorithms: 1) Statistical classifiers Linear Discriminant Analysis Quadratic Discriminant Analysis Logistic Regression Naive Bayes Bayesian Networks Least-Angle Regression Relevance Vector Machine 2) Nearest neighbor methods k-nearest Neighbor K-Star 3) Neural Networks Multi-Layer Perceptron Radial Basis Function Network 4) Support vector machine-based classifiers Support Vector Machine Lagrangian SVM Least Squares SVM Linear Programming Voted Perceptron 5) Decision tree approaches C 4.5 Decision Tree Classification and Regression Tree Alternating Decision Tree 6) Ensemble methods Random Forest Logistic Model Tree E. Evaluation The final step in software defect prediction is the evaluation of prediction model s capabilities. According to the output of prediction model, there is a different set of evaluation metrics available. Most defect prediction models classify software modules into fault-prone and non-fault-prone. With such binary classification, researchers most often begin evaluation with counting the number of correctly and incorrectly predicted modules and placing them into confusion matrix. The confusion matrix provides four scores. A true positive (TP) score and a true negative (TN) score are counted for every correctly classified faultprone module and non-fault-prone module, respectively. A false positive (FP) score and false negative (FN) score are counted for every misclassified non-fault-prone module and fault-prone module, respectively. Using these four scores it is possible to calculate several evaluation measures. Here are the most often used ones: Accuracy (number of correctly classified modules divided by total number of modules): ACC = T P + T N T P + F P + T N + F N The true positive rate (TPR), often referred to as recall or sensitivity (number of correctly classified fault-prone modules divided by total number of fault-prone modules): T P R = T P F N + T P The false positive rate (FPR), also known as false alarm rate or fallout (number of modules misclassified as fault-prone divided by total number of nonfault-prone modules): F P R = F P T N + F P A more general measure of evaluation is area under ROC curve (AUC). It is often used when comparing the performance of different prediction models. ROC (receiver operating characteristic) curve, originally from (1) (2) (3)

6 signal detection problem, graphically presents the tradeoff between TPR and FPR. AUC is merely the value of integrated ROC curve that spans from 0 to 1. Some defect prediction models predict the number of defects. Such numeric output require different evaluation measures. If we define p 1, p 2,... p n as predicted values and a 1, a 2,... a n as actual values we can compute following measures of evaluation: Mean-squared error: MSE = (p i a i ) 2 (4) n Root mean-squared error: RMSE = MSE (5) Mean-absolute error: MAE = (p i a i ) (6) n Relative-squared error: RSE = (p i a i ) 2 n (a (7) i a) 2 Relative-absolute error: RAE = (p i a i ) (a i a) The choice of appropriate measures depends entirely on given situation and the aim of research. V. SEARCH BASED SOFTWARE ENGINEERING IN SOFTWARE DEFECT PREDICTION (8) As demonstrated in previous sections, SBSE proved to be a very effective tool when dealing with various problems present in software engineering. For the purpose of this study, in order to find as much as possible of the research there had been done within the area of SBSE usage in software defect prediction, a review of literature was performed. Two systematic reviews were used as motivating examples how to conduct a thorough exploration of literature. A systematic review of literature performed by Beecham et al. [27] showed the way to explore the software defect prediction area. The papers that passed all their rigorous quality measures were included in our scope of research as well. Another systematic review, performed by Ali et al. [28] showed the way to explore the SBSE area. This study has put together the search terms from [27] and [28] and even expanded them. Both systematic reviews reported most relevant papers to be found in IEEE Xplore and ACM Digital Library databases. Thus, the following search term was looked for in abstracts in IEEE Xplore and ACM Digital Library: ("fault prediction" OR "fault forecast" OR "defect prediction" OR "defect forecast" OR "bug prediction" OR "bug forecast" OR "failure prediction" OR "failure forecast" OR "error prediction" OR "error forecast") AND ("metaheuristic" OR "metaheuristic" OR "search based" OR "search-based" OR "genetic algorithm" OR "evolutionary" OR "hill climbing" OR "simulated annealing" OR "ant colony"). A surprisingly low number of only 17 papers were found with this search, proving this area is yet to be explored. Though a small number of paper had been found with conducted search, some encouraging research topics were found. Podgorelec [29] wanted to improve the performance of defect prediction models constructing an outlier filtering method. His goal was to achieve more reliable results training the classifier with filtered data. The outlier filtering method was based on evolutionary induced decision trees. The idea for locating outliers was to look for data cases that produce opposite decision by accurate classifiers. The results indicated that all the decision tree based classifiers (AREX, ID3, C4.5) improved accuracy on training and testing set, Naive-Bayes classifier improved only on training set and the IB and logistic regression classifiers showed some improvement on testing set. In any predicting task, the use of irrelevant variables can degrade the model s performance. Pendharkar [30] proposed two hybrid exhaustive search and probabilistic neural networks (ES-PNN) and simulated annealing and probabilistic neural network (SA-PNN) models for selecting the variables that give best prediction accuracy. Using two real-world software engineering data sets, the results showed that hybrid algorithms outperform standard machine learning methods. Considering the factorial complexity, exhaustive search is not practical for large number of features. One examined data set has 8 code attributes. The difference between exhaustive search and simulated annealing approach in selecting the optimal set of predicting features for that data set was that exhaustive search found all eleven optimal combinations of features and simulated annealing found only one. The other examined data set has 22 code attributes. In order to perform exhaustive search on that data set, in given computational environment, a period of about 120 years would be required. This study presents a very pragmatic example of SBSE values. Interesting research was done by Hochman et al. [31]. In the prediction model building phase they used evolutionary enhanced artificial neural networks (ENN). The evolutionary algorithm was used to obtain the optimal configuration of artificial neural network for the task of defect prediction. They compared ENN algorithm with discriminant analysis and found ENN to be robust and of superior performance, suggesting the algorithm should be taken into account in other software engineering areas as well. A similar idea was found in study performed by Benaddy et al. [32] where they used genetic algorithm to enhance neural networks and regression model learning. Their results also proved that implementing genetic algorithm leads to better results than classical learning methods.

7 VI. CONCLUSION Software defect prediction is one of software engineering areas that rarely benefits from search based software engineering. The potential of using SBSE is vast and still needs to be explored more thoroughly. There are hybrid algorithms emerging within SBSE that provide promising results in other software engineering areas. SBSE algorithms found application even in data preprocessing stage of software defect prediction. There are also many algorithms within the prediction model building stage that could be used in combination with SBSE algorithms. To sum up, both SBSE and software defect prediction are relatively novel software engineering areas. Usage of SBSE in software defect prediction is even less explored. That is why they offer great opportunities for further research. REFERENCES [1] Mark Harman and Afshin Mansouri. Search based software engineering: Introduction to the special issue of the ieee transactions on software engineering. 36(6):737 741, 2010. [2] Sean Luke. Essentials of Metaheuristics. Lulu, 2009. Available for free at http://cs.gmu.edu/ sean/book/metaheuristics/. [3] Juan J. Durillo, Yuanyuan Zhang, Enrique Alba, Mark Harman, and Antonio J. Nebro. A study of the bi-objective next release problem. Empirical Software Engineering, 16(1):29 60, February 2011. [4] Yuanyuan Zhang, Mark Harman, Anthony Finkelstein, and S. Afshin Mansouri. Comparing the performance of metaheuristics for the analysis of multi-stakeholder tradeoffs in requirements optimisation. Information and Software Technology, 53(7):761 773, July 2011. [5] Anthony Finkelstein, Mark Harman, S. Afshin Mansouri, Jian Ren, and Yuanyuan Zhang. A search based approach to fairness analysis in requirement assignments to aid negotiation, mediation and decision making. Requir. Eng., 14:231 245, October 2009. [6] Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering, ICSE 09, pages 364 374, Washington, DC, USA, 2009. IEEE Computer Society. [7] Ethan Fast, Claire Le Goues, Stephanie Forrest, and Westley Weimer. Designing better fitness functions for automated program repair. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO 10, pages 965 972, New York, NY, USA, 2010. ACM. [8] Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, and Claire Le Goues. A genetic programming approach to automated software repair. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO 09, pages 947 954, New York, NY, USA, 2009. ACM. [9] Eric Schulte, Stephanie Forrest, and Westley Weimer. Automated program repair through the evolution of assembly code. In Proceedings of the IEEE/ACM international conference on Automated software engineering, ASE 10, pages 313 316, New York, NY, USA, 2010. ACM. [10] ThanhVu Nguyen, Westley Weimer, Claire Le Goues, and Stephanie Forrest. Using execution paths to evolve software patches. In Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops, ICSTW 09, pages 152 153, Washington, DC, USA, 2009. IEEE Computer Society. [11] Westley Weimer, Stephanie Forrest, Claire Le Goues, and ThanhVu Nguyen. Automatic program repair with evolutionary computation. Commun. ACM, 53:109 116, May 2010. [12] Cu D. Nguyen, Anna Perini, Paolo Tonella, Simon Miles, Mark Harman, and Michael Luck. Evolutionary testing of autonomous software agents. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS 09, pages 521 528, Richland, SC, 2009. International Foundation for Autonomous Agents and Multiagent Systems. [13] Mark Harman and Phil McMinn. A theoretical and empirical study of search-based testing: Local, global, and hybrid search. IEEE Trans. Softw. Eng., 36:226 247, March 2010. [14] Phil McMinn, Mark Harman, Kiran Lakhotia, Youssef Hassoun, and Joachim Wegener. Input domain reduction through irrelevant variable removal and its effect on local, global and hybrid searchbased structural test data generation. IEEE Transactions on Software Engineering, 2011. [15] Shin Yoo and Mark Harman. Test data regeneration: Generating new test data from existing test data. Journal of Software Testing, Verification and Reliability, To appear. [16] Shin Yoo and Mark Harman. Using hybrid algorithm for pareto efficient multi-objective test suite minimisation. J. Syst. Softw., 83:689 701, April 2010. [17] Yue Jia and Mark Harman. Higher order mutation testing. Inf. Softw. Technol., 51:1379 1393, October 2009. [18] William B. Langdon, Mark Harman, and Yue Jia. Efficient multiobjective higher order mutation testing with genetic programming. J. Syst. Softw., 83:2416 2430, December 2010. [19] Kata Praditwong, Mark Harman, and Xin Yao. Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng., 37:264 282, March 2011. [20] Massimiliano Di Penta, Mark Harman, and Giuliano Antoniol. The use of search-based optimization techniques to schedule and staff software projects: an approach and an empirical study. Softw. Pract. Exper., 41:495 519, April 2011. [21] Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayşe Bener. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engg., 17:375 407, December 2010. [22] V. Porter L. Brian, J. Daly and J. Wüst. Predicting fault-prone classes with design measures in object-oriented systems. In Proceedings of the The Ninth International Symposium on Software Reliability Engineering, pages 334, Washington, DC, USA, 1998. IEEE Computer Society. [23] Yunfeng Luo, Kerong Ben, and Lei Mi. Software metrics reduction for fault-proneness prediction of software modules. In Proceedings of the 2010 IFIP international conference on Network and parallel computing, NPC 10, pages 432 441, Berlin, Heidelberg, 2010. Springer- Verlag. [24] Lionel C. Briand, John W. Daly, Victor Porter, and Jürgen Wüst. A comprehensive empirical validation of product measures for object-oriented systems, 1998. [25] Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Amsterdam, 3. edition, 2011. [26] Stefan Lessmann, Bart Baesens, Christophe Mues, and Swantje Pietsch. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Trans. Softw. Eng., 34:485 496, July 2008. [27] S. Beecham, T. Hall, D. Bowes, D. Gray, S. Counsell, and S. Black. A systematic review of fault prediction approaches used in software engineering. Technical Report Lero-TR-2010-04, Lero, 2010. [28] Shaukat Ali, Lionel C. Briand, Hadi Hemmati, and Rajwinder Kaur Panesar-Walawege. A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans. Softw. Eng., 36:742 762, November 2010. [29] Vili Podgorelec. Improved mining of software complexity data on evolutionary filtered training sets. WSEAS Trans. Info. Sci. and App., 6:1751 1760, November 2009. [30] Parag C. Pendharkar. Exhaustive and heuristic search approaches for learning a software defect prediction model. Eng. Appl. Artif. Intell., 23:34 40, February 2010. [31] R. Hochman, J. P. Hudepohl, E. B. Allen, and T. M. Khoshgoftaar. Evolutionary neural networks: A robust approach to software reliability problems. In Proceedings of the Eighth International Symposium on Software Reliability Engineering, ISSRE 97, pages 13, Washington, DC, USA, 1997. IEEE Computer Society. [32] Mohamed Benaddy, Sultan Aljahdali, and Mohamed Wakrim. Evolutionary prediction for cumulative failure modeling: A comparative study. In Proceedings of the 2011 Eighth International Conference on Information Technology: New Generations, ITNG 11, pages 41 47, Washington, DC, USA, 2011. IEEE Computer Society.