Evaluation of the Search-Based Optimization Techniques to Schedule and Staff Software Projects: a Systematic Literature Review Daniela C. C. Peixoto a,, Geraldo Robson Mateus a, Rodolfo F. Resende a a Federal University of Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte, Minas Gerais Brazil Abstract CONTEXT: Researchers and practitioners related to Project Management have been very successful in supporting the project manager needs. Despite all the successes in developing a number of tools and guides, a reasonable amount of concepts and practices in software project management is not based on a principled reasoning. Search-based Software Engineering is a recent field of research that applies Optimization techniques to Software Engineering problems, including the ones in Software Project Management and, in particular, software project scheduling and staffing. The goal of these optimization techniques is to help the management decision making based on solid reasoning. OBJECTIVE: This paper aims to characterize the concepts used to model the software project scheduling and staffing problem and to evaluate the strategies used to assess the results. METHOD: We performed a systematic literature review that focuses on reports describing optimization techniques applied to schedule and to staff software projects. The review includes 52 papers published between 1994 and 2013. These papers were classified according to (i) the concepts used to model the problem, (ii) the employed optimization techniques, and (iii) the applied evaluation strategies. RESULTS: There exists a restricted number of Software Engineering concepts employed in these studies and almost no practical application, i.e., in the industry. A set of search techniques were used to address this problem. GA-based approaches are the most used in the researches (29%, 17 studies). Heuristic techniques, such as Greedy algorithms, are applied by 8 studies (14%). Exact methods, such as Branch and Bound and Integer Linear Programming, are applied by 3 studies (5%) and Hybrid approaches by 9 studies (16%). The evaluation is challenged by the lack of analysis of potential confounding factors. CONCLUSION: We observed that there exists a scarce research evaluating which concepts should be considered in the problem model. Our interpretation of this fact is that it is difficult to identify and analyze this information, mainly considering that project managers mostly use their intuition and past experiences to plan the project. Furthermore, in the beginning of this work we were expecting that most of the studies would have some empirical evaluation. Instead, what we found is that most of the studies employ some form of comparison with other approaches and most of the comparisons present some inaccuracies. Keywords: Literature Review, Resource allocation, Scheduling, Project Management, Search-based 1. Introduction Software project management is not a simple activity. A complex combination of technical and personal skills are required in the context of a software development project, such as, different manners of motivating people and generating knowledge [12]. In the same way, the software project involves people-intensive effort, usually organized in large teams, working together in a dynamic Corresponding author Email addresses: cascini@dcc.ufmg.br (Daniela C. C. Peixoto ), mateus@dcc.ufmg.br (Geraldo Robson Mateus), rodolfo@dcc.ufmg.br (Rodolfo F. Resende) Preprint submitted to Universidade Federal de Minas Gerais May 10, 2014
environment with unstable project parameters [20]. This skillful combination and the diversity of variables makes the software project management a demanding task. With all these challenges, the software project manager assumes the responsibility of scheduling, planning, monitoring and controlling the software project [47]. The project manager s primary goals consist of satisfying the established constraints and objectives and the stakeholders needs. Experienced project managers comprehend the difficulties in managing a project [20] and typically they adopt tools to support them such as MS Project [27], Gnome Planner [68] and OpenProj [63]. In addition to these tools, there are also some guides as, for example, COCOMO-II [11] and PMBOK [47]. The high complexity of some of the software development processes and, in particular, project management, justifies the research into computer aided tools to properly support the decision making [3]. In this context, a new research area, named Search Based Software Engineering (SBSE), has emerged by the application of search-based optimization techniques to complex Software Engineering problems [43]. The search algorithms can be classified in three main groups. Exact optimization methods guarantee finding an optimal solution [70], for instance, Branch and Bound algorithm or Integer Linear Programming. Heuristic methods do no guarantee to find an optimal solution [70]. These algorithms search for a good or near-optimal solution, for example, Greedy algorithms. Metaheuristics are heuristics methods that continue the search beyond the first encountered local optimum, for example, Genetic Algorithms (GA), Particle Swarm Optimization, Ant Colony Optimization. The metaheuristics are the most preferred approach in the SBSE field, mainly because it is not possible to characterize Software Engineering problems with linear equations [41, 25]. There are also hybrid approaches that combine techniques from these three main groups. SBSE approaches contributes to reduce efforts and costs by obtaining solutions for complex and hard work software development activities [25]. SBSE has been applied to a diversity of Software Engineering problems [43] and, in this study, our focus is on project planning ones. In particular, we conducted a literature review in order to analyze the studies related to scheduling and staffing problems in software projects. We refer to these software project problems in the SBSE field as SBPM (Search-Based Project Management) problems. These problems address the definition of developer-to-activity allocation [9] and the decision of when each activity should be carried out [3]. Although there exists some tool support, project managers still rely on their knowledge, past experience and intuition when planning a project. These subjective and biased techniques does not always produce the best results [76]. The first contribution of this work is to illustrate the evolution in terms of algorithms and the deficiency in terms of the modeled concepts in order to address the SBPM problems. In the SBPM domain, there is an increasing interest in new optimization algorithms and techniques [3, 24, 22] that can solve the problems with an even better performance, presenting better solutions. On the other hand, we did not observe a substantial investment in producing models that resemble more closely those in software development organizations. This is a more Software Engineering oriented analysis of a field that brings together knowledge from two disciplines. Despite the fact that there is inherently no silver bullet [15], the lack of appraisal by software project managers and the use of limited number of concepts deployed in the problem model 1 aggravate the possibility of the study being considered useful. This may restrict the contribution effectiveness of these approaches for software practitioners. 1 In this work, the term problem model refers to the mathematical formulation of the SBPM problem. 2
In addition, we were also able to identify some shortcomings of many an SBSE reports. Therefore we decided to compile some suggestions of how SBSE results should be reported. Our second contribution is the compilation of guidelines for SBSE reporting based on weaknesses observed in the selected studies. Our last contribution consists of a set of numerical observations that can be succinctly summarized as follows: Among the evolutionary techniques, GA-based are the most used in the researches (29%, 17 studies), followed by Hybrid (16%, 9 studies) and Heuristics (14%, 8 studies). Most instances used in the validation step are not impressive in size. The comparative analysis is the most common evaluation strategy (56%, 29 studies), followed by sensitivity analysis (37%, 19 studies). Remarkably, only one study employed a quasi-experiment approach [9] and two studies employed informal experiments [50, 14]. This paper is organized as follow. In Section 2, we set forth our systematic literature review. In Section 3, we describe the main observations mentioned above, while in Section 4 we discuss the research questions and the SBSE reporting guidelines. In Section 5 we discuss the limitations of this study. The paper is concluded in Section 6. 2. Research Method A Systematic Literature Review (SLR) consists of three main phases: planning, conducting and reporting the review [51]. In this section we summarize the review protocol that was produced during the planning phase. This protocol was constructed based on established methods for carrying out systematic literature reviews [51, 52]. This review protocol consists of the description of the following stages: research questions specification, case study roles definition, electronic databases selection, search string definition, study selection, quality assessment, and data extraction and synthesis. 2.1. Research Questions The main goal of this research is to investigate the studies related to project planning in the SBSE domain. We refined this main goal in three research questions described as follows. RQ1: How is the research designed and reported? This question is important considering our attempt to collect information about scheduling and staffing problems in the SBSE field. One relevant step for the analysis of the studies is to first identify the concepts used in the problem modeling. This is important to establish the connection among the Software Project Management practices, the optimization algorithms and the validations carried out to assess the results. This research question is refined in the following subquestions: RQ1.1: What are the concepts used for the problem modeling? RQ1.2: What are the different search techniques used for the problem? RQ1.3: Are the studies designed and reported in a way that can be easily assessed and replicated? 3
RQ2: How can the results be applied in a software development project? This question is essential to categorize benefits of practical application and bind problems and solutions in a contextualized setting. RQ2.1: Are the SBSE algorithms contributing to practice? RQ2.2: Is there any evidence of the benefits of this approach? RQ3: What types of strategies are used to validate the results? This question provides an underpinning aspect to analyze the empirical evidence provided by the studies. 2.2. Case Study Roles We assigned specific roles to the team members in order to perform the systematic review as follows: SLR supervisor: Supervisor is responsible for reviewing the protocol and the final results, ensuring that the research team collects the required information. SLR Team Leader (TL): TL is responsible for constructing the SLR Protocol document. SLR Research Team (RT) member: RT member is responsible for executing the SLR process (identification of the primary studies and data extraction) and documenting the results. 2.3. Data Sources The primary studies of this SLR were obtained from searching electronic databases that met the following criteria: (i) The databases contain peer reviewed Software Engineering journal papers and conference proceedings; (ii) the databases have a search engine with an advanced search mechanism that allowed keyword searches; (iii) the databases provide the access to full text documents; and (iv) the databases were used in other Software Engineering systematic reviews. The resulting list of databases that we searched was: ACM Digital Library (http://portal.acm.org/) IEEEXplore (http://www.ieeexplore.ieee.org/) Wiley InterScience (http://www3.interscience.wiley.com/cgi-bin/home) Elsevier ScienceDirect (http://www.sciencedirect.com/) SpringerLink (http://www.springerlink.com/) The search results were manually organized with a tool called Mendeley 2. This tool provides a combination of a desktop and a website that allowed the sharing of information among the research team members. It also supports the automatic extraction of document details (e.g. authors, title, and journal name) from the searched databases into the tool database, which saved a lot of manual typing. During the stages of this SLR, each team member had access to the work done by his colleagues, facilitating the verification. 2 http://www.mendeley.com/ 4
2.4. Search Criteria Search keywords are very important for the quality of the retrieved results, so they must be chosen carefully. In our study, these keywords were based on a technique called PICO (Population, Intervention, Comparison and Outcomes) [52]. The Search String (SS) derived from the research questions that includes the keywords identified from the PICO criteria is: ((software AND project) AND (planning OR management OR scheduling OR schedule OR staffing OR staff OR resource OR allocation)) AND ( search based OR search-based OR optimization OR metaheuristic OR evolutionary OR meta-heuristic OR heuristic OR exact algorithm ). In order to validate the SS we conducted an investigative search. Since we were expecting to retrieve articles that were not related to our target (false positives), our concern was related to false-negatives, i.e. articles related to our target but not retrieved with the search strings. In the first step, we carried out a manual search in two journals: Computers & Operations Research and Information and Software Technology from 2010 to 2012, and we compared these results with the automatic search results. Since we found the same valid studies, we considered the SS an appropriate search string for our research. Different databases have different limitations [82], for instance, in SpringerLink database it is not possible to search only in abstract. Therefore we combine the keywords into the search strings and we defined the restrictions for each databases as shown in Table 1. Table 1: Advanced Search Strings Library IEEEXplore ACM Digital Library Wiley InterScience Elsevier ScienceDirect SpringerLink Advanced Search String: SS in Document title, Index terms, and Abstract Publisher: IEEE Content Types: Conferences Publications and Journals & Magazines String: SS in Title, Abstract, and Keywords String: SS in Article Titles, Keywords, and Abstract String: title-abstr-key(ss) Subject: Computer science Refine your search: Journals String: SS Discipline: Computer Science Language: English Results: We evaluated the first 1,000 studies retrieved by this search. 2.5. Study Selection By following the steps prescribed by Kitchenham and Charters [52], we established a multistage process consisting of four steps with different review processes as described below. First step: The goal of this step is to remove duplicate and irrelevant papers. It was carried out evaluating only the title of the papers. For each database, one research team member was responsible for the separation of the papers (included and excluded ones) based on the title selection criteria. 5
Second step: The goal of this step is to eliminate papers for which the abstract has no relationship with any of the research questions. The papers were assigned to one RT member. Third step: The goal of this step is to eliminate papers by scanning through the full text in order to check whether the inclusion or exclusion criteria were met. Doubts were discussed with TL or with supervisors. Fourth step: The goal of this step is to exclude remaining duplicated works and analyze the papers thoroughly in order to extract data from the ones that met the inclusion criteria. In this stage a full text analysis was performed on 52 papers and the quality of the studies was further assessed. Doubts were discussed with TL or with supervisors. The inclusion and exclusion criteria shown in Table 2 were used to narrow the search to relevant papers. Papers that address the resource allocation and task scheduling problem in the SBSE domain were included. Figure 1 depicts the SLR steps and the number of papers identified at each step. Table 2: Inclusion and exclusion criteria Criterion Description Inclusion Papers that describe an approach in which search based optimization algorithms are used to identify optimal or near optimal solutions for problems related to the resource allocation or task scheduling problems in the Software Engineering field. Publications/reports for which only an abstract or a PowerPoint slideshow are available. Short papers, editorials, posters, position papers, introductions of keynote, workshop, minitracks, special issues, or tutorials. Studies that are based only on expert opinion, i.e. it is merely a lessons learned report based on expert opinion (it is not a research paper). Exclusion Studies presented in languages other than English. Studies not related to any of the research questions. Studies whose findings are unclear or ambiguous (i.e. results are not supported by any evidence). Studies external to Software Engineering field and Software Project Management field. Duplicated studies of the same work. When there is more than one study related to the same work, we included the most complete version and excluded all the other ones. Studies containing unsupported claims or frequently referring to existing work without providing citations. Studies that present tool evaluation or methodology experimentation without an SBSE focus. Studies that describe an optimization algorithm with not enough description of the problem model. 2.6. Quality Assessment During a systematic review, it is crucial to assess the quality of the studies in order to minimize bias and maximize internal and external validity [52]. We created our quality assessment form based on checklists and guidelines from three other works [34, 44, 71]. Table 3 presents a summary of the quality assessment criteria applied in this SLR. These criteria, similarly to the ones in Dybå and Dingsøyr [32], cover four aspects of quality that are critical when examining research papers. Reporting: It is related to the quality of reporting the goals, context, and purpose of the research. (Questions 1-3) 6
Figure 1: SLR steps Rigor: It is related to the appropriateness of the approach applied to study the SBSE resource and staff problems. (Questions 4-8) Credibility: It is related to the assessment of the confidence in the study s methodology for ensuring that the results are valid and meaningful. (Questions 9-12) Relevance: It is related to the use of the findings by the software industry and the research community. (Question 13) Table 3: Quality assessment criteria Number Question Issue 1 Is there a clear statement of the aims of the research? Reporting 2 Is there an adequate description of the context in which the Reporting research was carried out? 3 Do the conclusions relate to the aim and purpose of research? Reporting 4 Is the study research design appropriate to address the aims Rigor of the research? 5 Are threats to validity analyses addressed in a systematic Rigor way? 6 Is there a control group with which to compare treatments? Rigor 7 Are the study cases appropriate for the study? Rigor 8 Are well defined the metrics used in this study? Rigor 9 Are the metrics relevant to address the objective of the Credibility study? 10 Are the data collection procedures suitable for the research Credibility purpose (data sources, collection, storage)? 11 Are the analysis procedures sufficient for the purpose? Credibility 12 Are the findings (positive and negative) presented? Credibility 13 Are the results useful for another organizations or researchers? Relevance 7
These 13 criteria provide a measurement that guides the interpretation of studies findings and determines the value of their contribution to this SLR. The scoring procedure consists of a three scale criterion: Yes (1), Partially (0.5) or No (0). We defined this scale because sometimes a simple Yes/No answer may be misleading. It is not a good practice to include study quality and reporting quality scores in a single metric [52]. We adopted the weight of 1 to the reporting questions (1-3) and 1.5 to the other questions (4-13). Again, all disagreements were resolved by discussions that included all team members. 2.7. Data Extraction and Synthesis The data extraction form was used to ensure the consistency and accuracy of the information. The form contains fields derived from the research questions and from publication data. We first piloted the extraction process on the ACM Digital Library database. The team members discussed, in a consensus meeting, the findings and the improvements to be incorporated into the form. In this meeting, they analyzed the extracted information making sure that the inconsistencies and doubts were solved. Then, one researcher reviewed and extracted data from the remaining research papers. This latter stage is consistent with the process followed by other systematic reviews [79, 81]. For each paper we collected data that were grouped into four categories: Publication-related data: digital library, title, year, source (e.g. journal or conference proceedings). SBSE-related data: optimization technique, type (mono or multi-objective), context. Problem Model-related data: goal, constraints, concepts and attributes, practical application. Research-related data: evaluation method (e.g. empirical experiment, statistics), element under evaluation, performance factor, outcomes, baseline, data used to validate, benefits. Three categories (except publication-related data) included information related to answering the research questions (see Table 4). The data categories along with a description are presented in Table 5 and the list of studies in Table A.13. We synthesized the data by identifying recurrent themes emanating from the case studies reported in each paper. These identified themes gave us the conceptual model described in the next section. This conceptual model coding clusters related case studies data into smaller number of sets, allowing a more integrated schema for analyzing the results. Table 4: Data collected and research questions Research Question RQ1.1 RQ1 RQ1.2 RQ1.3 RQ2.1 RQ2 RQ2.2 Note Data Collected goal, constraints, concepts and attributes, practical application. optimization technique, type, context. evaluation method, element under evaluation, performance factor, outcomes, baseline, data used to validate. goal, constraints, concepts and attributes, practical application. benefits, outcomes. Any other important information for this research. 8
Table 5: Data extraction categories Category Attribute Description SBSE-related data Optimization Technique The search-based algorithms or approaches applied to staffing and scheduling problems. Type How many objectives are addressed by the search technique. Single-objective whether only one, multi-objective whether multiple objectives are considered. Context The description of the software project or process development. Goal The metrics used in the objective functions. Problem Model-related data Constraints The constraints that needed to be satisfied by the optimization algorithm. Concepts and attributes The concepts and their respective attributes used as decision variables and input data. Practical Application yes whether there is any application of the approach in a software development organization and no otherwise. Evaluation method The approach used to evaluate the optimization technique. Element under Evaluation Subjects or algorithms used during the evaluation. Performance Factor The metrics considered during the approach evaluation. Research-related data Outcomes The results achieved by the approach evaluation. Baseline Defined values or results used for comparison with the results obtained by the optimization technique. Data used to Validate Data used as input for the optimization technique evaluation. Benefits The benefits achieved with the optimization technique. Note Note Any other important information for this research. 3. Literature Analysis 3.1. Methodological Quality As mentioned in Section 2.6, we evaluated each of the primary studies according to 13 quality criteria (Table 3). The quality criteria were employed in our study to investigate systematic differences between studies. The papers ranged from very well organized studies in the SBPM field, to very concise ones where essential information for our analysis was missing. We were not interested in stating that one SBPM study is better or worse than other. Instead, we focused on a set of issues that contribute to the quality of our research. Almost all studies have some description of the aims of the research, with the optimization goal, the constraints, decision variables and input data of the optimization problem. However, in 19 studies (37%), the concepts represented in the problem model were not described properly. For 22 (42%) studies, the chosen data collection and data analysis procedures were not explained. As many as 38 of the 52 primary studies (73%) did not address the threats to validity. Only 20 studies (38%) provide some form of contextual description in which the research was conducted (e.g. development process or development project). Only 28 (54%) studies provide a control group for results comparison purposes. Therefore, we observed that the validity was not adequately addressed; and the contextual description, data collection, and data analysis were often not well explained. Only four studies got a full score on the quality assessment (18 points) [28, 29, 49, 69]. The lowest number was 2.25 (see Table 6). 9
Table 6: Quality Scores Score Study ID Frequency <5 S33, S41, S48 3 5 S5, S10, S18, S22, S24, S29, S32, S36, 14 and<10 S40, S42, S44, S47, S49, S50 10 S1, S3, S4, S9, S11, S12, S13, S14, S15, 27 and<15 S16, S17, S19, S20, S23, S25, S26, S28, S30, S31, S34, S35, S37, S39, S43, S45, S51, S52 15 S2, S8, S27, S46 4 and<18 18 S6, S7, S21, S38 4 3.2. Studies Analysis In this section, we discuss the primary studies selected for an in-depth analysis. This analysis was divided into four categories of information: publication, SBSE, problem model and researchrelated data. Most of the papers were case studies reporting resource-constrained project scheduling problems. An important remark is that 94% of the papers (49 studies) do not validate their research with the main stakeholders (e.g. project manager). We believe that an important reason is that the focuses of these researches are more on the algorithm performance than practical evaluation. This is reinforced by the lack of practical application (i.e. in the real-world of a software development company). In addition, the results shows that metaheuristic search techniques are the most applied optimization technique (56%, 29 studies). A variety of metaheuristic search techniques are found to be applicable for SBPM problems, including Genetic Algorithms (33%, 17 studies), Simulated Annealing (6%, 3 studies), Ant Colony methods (4%, 2 studies), and Swarm Intelligence methods (4%, 2 studies). 3.2.1. Publication-related Data The systematic review considered a period of 20 years (1994 to 2013). In total, we found 2765 papers as presented in Table 7. Table 7: Distribution of studies by source Digital Library Number of Relevant Selected papers studies studies IEEEXplore 636 (23 %) 20 (35 %) 17 ( 33%) ACM 79 (3%) 4 (7%) 4 (8%) Wiley InterScience 600 (22%) 6 (11%) 6 (11%) Elsevier 450 (16%) 10 (18%) 10 (19 %) SpringerLink 1000 (36%) 17 (30%) 15 (29%) Total 2765 57 52 After applying the exclusion criteria to the discovered papers, there were 57 relevant studies. Finally, identical studies found in several sources were removed, resulting in 52 different selected 10
studies. The list of the selected studies with their respective sources and year of publication is depicted in Appendix A, Table A.13. 3.2.2. SBSE-related Data The most applied optimization technique is the evolutionary (56%, 29 studies), and the GAbased are the most used in the researches (33%, 17 studies). Heuristic techniques, such as Greedy algorithms, are applied by 8 studies (15%). Other techniques identified included: (i) other metaheuristics techniques (21%, 11 studies), such as Particle Swarm, Ant Colony and Simulated Annealing, (ii) exact methods, including Branch and Bound and Integer Linear Programming (6%, 3 studies) and (iii) Hybrid approaches (17%, 9 studies). A large percentage of studies 88% (46 studies) addresses single-objective approaches. Considering the multi-objective evolutionary technique, NSGA-II (10%, 5 studies) is the most used MOEA (Multi-objective Evolutionary Algorithm), followed by SPEA2 (8%, 4 studies) and MOCell (6%, 3 studies). Table 8 presents the distribution of SBPM papers within each optimization technique. Table 8: Optimization techniques In the last five year, we observed a trend to use MOEA as well as other algorithms, such as Ant Colony Optimization and Particle Swarm Optimization (see Table 9). Our search found the first two papers about MOEA in 2009. The same results were found by other researchers [42, 25]. Table 9: Optimization techniques used along the years. Optimization Technique Study ID Frequency Genetic Algorithm S1, S6, S9, S11, S14, S22, S24, S25, S27, 17 S29, S38, S39, S41, S42, S43, S47, S49 Heuristic S5, S13, S17, S18, S26, S28, S32, S50 8 Exact S12, S30, S31 3 Other Metaheuristic S6, S7, S9, S19, S20, S21, S34, S37, S39, 11 S40, S43 Hybrid S8, S10, S16, S33, S36, S44, S45, S46, S51 9 Multi-objective S3, S4, S6, S15, S23, S35, S52 7 Other approaches S2, S48 2 Optimization Technique 2009 2010 2011 2012 2013 Genetic Algorithm 2 3 3 1 1 Heuristic 2 0 0 0 0 Exact 0 1 1 0 0 Other Metaheuristic 2 2 3 1 1 Hybrid 1 0 0 1 0 Multi-objective 2 0 2 1 0 From the inspected studies, we extracted information related to the application context of the SBPM approaches. We grouped the studies according to the organizational business model, number of projects, and the software development methodology. Organizational business model involved in a distributed or internal software development company. Global Software Development (GSD) studies describes multi-site and multi-cultural 11
staffing and scheduling problems. These problems introduces a new set of communication, technical, managerial, and coordination challenges [48]. Tree studies address this topic: S13, S17 and S32. Other articles describe an internal software development. Number of project: Project portfolio management studies describes staffing and scheduling of multi-projects. Resource contention among projects increases the risks during the planning phase [86]. Seven studies address this topic: S1, S9, S24, S25, S27, S29, and S40. Other articles characterize a single project analysis. Software development methodology: The method of the agilists ( Agile Methods ) [5] proposes new challenges considering incremental development and management. Three studies address this topic, including studies S7, S28, and S52. Waterfall-based processes as well as other traditional methods such as RUP are also addressed in eight studies: S5, S9, S38, S40, S42, S43, S45, and S47. In fact, a relative large number of studies (62%, 32 studies) did not report in a clear and defined way the research context. In these cases, whenever possible, we interpreted the applied context from reading the studies. The above-mentioned context groups influenced the studies problem model description. For instance, in Agile studies, the problem model represents the iteration concept, and in GSD researches, the working period may include a slot attribute, corresponding to three 8-h time slots, summing up 24 hours. 3.2.3. Problem model-related Data In this section we focus on the main concepts and goals represented by the SBPM problems. In addition, we categorized them according to the addressed problem context. Table 10 lists the main concepts, Figure 2 depicts the problem model and Table 11 lists the main goals we identified in this systematic literature review. The first table shows that task, resource and skill are the concepts most commonly found, as expected. The second table shows that duration and cost are the most measured goal, followed by skill, quality, translated as number of defects, and productivity. Some aspects of the elements depicted in Figure 2 can be understood looking at Table 10. For example, Experience is related to time in projects and Skill is related to time in training. The change in these goals determines an effect similar to the relationship of cost, time and scope, in the Project Management Triangle. In this triangle, the relationship is such that if any one of the factors changes then at least one of the others will be affected. Moreover, it is also not possible to improve all factors at the same time [47]. We were not able to conduct a very strict aggregation of research results, e.g. through metaanalysis or meta-ethnography [8]. However, it is not our ultimate goal to compare specific techniques. In our case, more general concepts are more relevant to analyze than detailed implementation issues. Defining the concepts of an SBPM model requires an in-depth understanding of the project management activities, the organizational context and the resources involved in the project. In order to share the knowledge of SBPM we categorized the concepts presented by these studies which allows a more integrated schema for analyzing the results. These groups were derived from the literature and they consist of a generalization of the usual elements identified in the problem model. It is by no means a comprehensive list. Indeed, emphasis is placed on the most recurrent aspects found during the literature analysis. 12
Table 10: SBPM concepts Concept Description Common attributes Frequency Task Tasks is an element of project work decomposition. Task identification, Precedence (in 48 Although task and activ- ity definitions are different, we grouped this information in order to facilitate our analysis. most studies represented as a Task Precedence Graph - TPG), Skill requested(level and the skill set), Effort(in many studies estimated using COCOMO [11] or some Fuzzy logic), Maximum headcount (Maximum number employees who can be assigned the task), Complexity, Priority Resource Employee who performs a task. Resource identification, Role (role in 43 the software process), Executable task type (tasks that the resource are allowed to execute), Salary, Availability (start and end date), Productivity, Skills (level and the skill set), Experience (experience in some task or period of time that carried out a specific task), Dedication, Maximum daily workable hours, Learning speed, Overwork (amount allowed and the cost) Skill The employees expertise. Skill identification, Level (skill categories) 14 Team Project Artifact Other concepts Group of employees responsible for doing the task A software project which has a defined beginning and end date. A product produced during software development. Examples: Process-related concepts (Phase, Increment, Iteration, Risk, Release) and Business modelconcept(sites), Size, Communication overhead (Measure dependent of the team size), Specialization Deadline, Budget, Benefit rate (when a project is finish before its deadline), Penalty rate (when a project is late), Cost, Weight (Priority for the organization) Size, Quality (number of defects), Precedence (common for components) Phase or Increment penalty (when a project is late), Sites (number of different organizations in a GSD) 13 20 10 27 Table 11: SBPM goals Goal Description Frequency Duration It represents the length of time that the software product takes 44 to be available to customers. Cost It depicts the monetary resources needed to complete the software 25 development. Skill It represents the sum of skill levels for the project. 5 Productivity It is a measurement of the technical efficiency of the development 4 team. Quality It represents the number of defects detected after delivery. 4 Other goals Examples: Utility (weights the importance of keeping the 9 rescheduled process similar to the initial process), Stability (weights the importance of responding to the disruption, e.g., requirements changing), Resource usage (number of developers assigned to each task), and Overwork (It represents when the actual dedication is above the maximum dedication.) 13
Figure 2: SBPM Problem model. The first group represents the core concepts of this problem context: resources, task and skills. These concepts are incremented as the context is specialized (see Section 3.2.2). For example, when there is outsourced work, we can have resources within the company and outside the company. External resources may have restrictions related to joining trainings and, in consequence, improving their skills. Developments that involve companies around the world (GSD) define teams working in different time zones. Agile methods include increment information, and traditional processes may include information about phases or iterations. In a portfolio planning, it is interesting to know the weight and priority of each project. As we have shown in the Figure 2, different concepts can be represented in an optimization problem model and they are affected by the problem context. In the analyzed papers, there is a clear focus on the basic concepts (task, resources and skill) and their attributes. There is not a suitable investigation regarding human, organizational and process factors that affects a project development. For example, only 10 studies (19%) considered different employees productivities and, even intuitively, the project manager considered this attribute during planning. Five studies (10%) define that the skill level of a resource can change during the life of the project, such as the engineers who gain experience by previous activities. Only one study (2%) includes communication overhead factor when the team size increases. Process change and technology adoption, two aspects that constantly happen in a project development, are usually omitted. This lack of investigation encourages further research in the SBPM field. 3.2.4. Research-related Data We categorized the studies according to the method used to validate the proposed approach. Since most of all studies did not mention explicitly this information, we collected it from reading the paper. The following categories were used to classify the papers: Control experiment if the study validates the research by measuring the effects of manipulating one variable on another variable [17] and that subjects are assigned to treatments by random [83]. When the subjects are not randomly assigned to treatments, we classified the study as a quasi-experiment [71]. When the subjects evaluate the approach without rigorous control, we classified the study as a informal experiment. In an experiment it is important to know the subjects in each treatment in order to check the quality of the analysis and interpretation results. Sensitivity analysis if the study investigates the sensitivity of the optimization approach results changing the input values [41]. It is important to know the most important impacts 14
of solutions factors. Comparative analysis if it is collected quantitative and/or qualitative data when comparing the proposed approach with other optimization approach. This analysis can employ statistical methods to compare the results. A comparative analysis can define a baseline from which the solutions can be compared. Report if there is only a description of the algorithm application. The comparative analysis is the most common evaluation strategy (48%), followed by sensitivity analysis (31%). Remarkably, only one study employs a quasi-experiment approach [9] and two informal experiments [13, 14](Figure 3). Although the comparative analysis is the most used approach, the validity of the results, in most of the cases, did not include suitable statistical techniques, such as hypothesis testing or correlation. Figure 3: Distribution of SBPM validation approaches Most of the comparative analysis studies do not mention the use of a baseline. When a baseline is defined, the random search is the most common method, followed by some simpler baseline as, for example, derived from worst case scenario. We considered that this scenario happens when the technique is configured with data that will produce its worst results (e.g. worst performance, highest cost or longer duration). During our analysis, we observed that it was not possible to make a more accurate comparison among the studies due to the diversity of contexts, the different measurement units, and the lack of the information about potential confounding factors. In order to check the evaluated performance of the approaches, we aggregated the size of the tasks and resources input data, relative to the numbers in the studies, in the following sets: small - up to 30 tasks and 5 resources; mediumbetween 30 and 90 tasks and between 5 and 15 resources; large - above 90 tasks and 15 resources. A study has to attend the limits of both concepts, for example a study with 100 tasks and 10 resources is considered attending a medium set. Remarkably only two studies have mentioned a practical application of their approach (i.e. in industry). Random instances or fictitious project data were used by 35 studies (67%). Real instances were used in 17 (33%) studies. When analyzing the instances size, specifically, the numbers of Tasks and Resources, we concluded that most of the instances are small or medium (see Figure 4). 15
Figure 4: Instances size 4. Discussion 4.1. Research Questions This section discusses what the SBPM studies results say about our three research questions: #1 How is the research designed and reported? #2 How can the results be applied in a software development project? #3 What types of strategies are used to validate the results? We summarize and discuss important findings considering the refined research questions. By answering each subquestion individually, we will answer the main research questions. #1.1 What are the concepts used for the problem modeling? We extracted text containing information describing the concepts used in the problem model. We observed that the most common concepts represented in the models are: resource, task and skill. In order to represent a model closely to the industrial environment more factors such as, human, organizational and process need to be take into account. As one can say that it is difficult to get software measurement of these type into practice, for example, productivity or communication overhead. One reason, can be the immaturity of organizational development processes. While there is a major advantage for project manager in measuring these factors, there is no widely accepted framework. However, project managers, even without a defined measurement process or approach, use a certain classification scheme. At least, these informal classification scheme should be incorporated in the SBPM models. In addition, Software Engineering rules usually employed in supporting tools as simulations, can also be helpful when generating the model and the instances. For example, two rules that can be mapped in the problem model formulation: Two factors that affect productivity are work force experience level and level of project familiarity due to learning-curve effects [1] and The training of new employees is usually done by the old-timers, which results in a reduced level of productivity on the old-timers part. Specifically, on the average, each new employee consumes in training overhead 20% of an experienced employee s time for the duration of the training or assimilation period [1]. A extensive list of Software Engineering rules can be found in [61, 56]. #1.2 What are the different search techniques used for the problem? As mentioned before, the most applied optimization technique is the evolutionary (56%, 29 studies), and the GA-based are the most used in the researches (29%, 17 studies). A large percentage 16
of studies 88% (46 studies) addresses single-objective approaches. Considering the multi-objective evolutionary techniques, NSGA-II (10%, 5 studies) is the most used MOEA, followed by SPEA2 (8%, 4 studies) and MOCell (6%, 3 studies). As confirmed by other researches [42, 2, 25], there is a trend to use new algorithms and multiobjective approaches. The first MOEA studies were published in 2009. These techniques are more suitable to deal with complex Software Engineering problems than other classical techniques. #1.3 Are the studies designed and reported in a way that can be easily assessed and replicated? All studies stated clearly the optimization goal and the concepts used in the conceptual model. However, most of the studies (62%, 32 studies) did not explicitly define the context where the approaches can be applied. In addition, 35 studies (67%) do not present the proposed solution (e.g. pseudo-code) and only 14 articles mentioned the threats to validity. Most of the validation (42 studies, 81%) were carried out without a properly statistical analysis (descriptive or hypotheses testing). Without this kind of analysis, it is difficult to infer that one approach perform better than other one, as well as, to interpret or replicate the results. #2.1 Are the SBSE algorithms contributing to practice? Although 17 (33%) studies uses real data instances, we did not identify a conclusive evidence of the practical application of the SBSE algorithms. Only two articles, S8 and S16, mentioned the use of the algorithms in a software development organization. Therefore, it is needed more investments to bridge the gap between the research and practical field. # 2.2 Is there any evidence of the benefits of this approach? The application of the optimization approaches to instances of the problem have shown significant improvements in most of the cases. The improvements are related to the project planning creation according to the defined objectives, or compared with another technique. The publication of positive results and the omission of the negative ones is considered a bias. Since a number of studies did not evaluate bias or threats to validity, we have to analyze these results with care. In addition, it is difficult to conclude that the results can be generalized, mainly because of the problems pointed out in RQ 1.3. #3 What types of strategies are used to validate the results? As mentioned in Section 3.2, the comparative analysis is the most common evaluation strategy (56%, 29 studies), followed by sensitivity analysis (37%, 19 studies). Remarkably, only one study employs a quasi-experiment approach [9] and two informal experiments [50, 14] (Figure 3). Although the comparative analysis is the most used approach, the validity of the results, in most of the cases, were not strength with application of statistical techniques, such as hypothesis testing or correlation. Therefore, we can question the validations accuracy. The impact of this lack of statistical analysis is increase by unsatisfactory evaluation of threats to validity. 4.2. Guidance for Reporting SBPM Initiatives With respect to the papers addressed by this SLR, a relative large number of studies did not report in a clear and defined way the context, data collection and analysis methods. In addition, a few papers discuss on validity threats. There are many guidelines [44, 71, 74, 53, 54] for conducting 17
and reporting Software Engineering empirical research. However, we were not able to find general guidelines for reporting SBSE initiatives. We decided to reinforce some important aspects that are not well represented in these papers, which we believe can be useful in future researches. The context should be clear and defined. In order to reuse and draw valid conclusions of SBPM studies it is important to define the context where the case study happened. In our analysis, a relative large number of studies (62%, 32 studies) did not report in a clear and defined way the research context. One solution is to follow the checklist proposed by Petersen and Wohlin [67] composed of six different context facets: product, processes, practices and techniques, people, organization and market. They structured the context for empirical industrial studies, providing a checklist to aid researchers during the selection of the context information, making, therefore, comparisons feasible. The validity of the study must be analyzed. The validity denotes to what extent we can trust in the results, since they can be biased by the researches point of view [33]. The studies have serious limitations in terms of validity and credibility of their findings. Very few addressed the threats to validity (22 studies, 42%). This problem has also been detected by other SBSE study [4]. The overall SBPM model, solution and validation description should be reported. In order to reuse or adapt the SBPM experience to different contexts, we judge it essential that researches describe the following items: the problem model, with the concepts, attributes, constraints and objectives, the proposed solution, indicating the extent it represents the problem model, the instances used to test the solution, and the validation approaches. With the exception of the concepts and attributes, most of these items were not presented in the studies. The data collection and analysis should be clearly reported. When we evaluated the studies, a considerable number either did not present or presented insufficient information about the collection (Table 12). Table 12: Quality of data collection and analysis Not stated Insufficient information Data collection 22 (42%) 16 (31%) Data analysis 22 (42%) 21 (40%) Since the measurement is considered an important step to evaluate and communicate the results, a better definition, collection and analysis will permit a deeper understanding of the studied phenomena. Therefore, it will be possible to make more elaborate comparisons and analyses. 5. Limitations The use of systematic procedures itself helps to avoid problems with the selection and analysis of the studies. Even though this SLR has been supported by a pre-defined study protocol and conducted in a systematic way and under the guidance of experts, it has some limitations. We will discuss these limitations considering four aspects [51]: Construct Validity: to what extent the selected studies represent what we aim to investigate; Reliability: to what extent the data collection and analysis were conducted in a way that it can be repeated by other researchers with the same results; 18
Internal Validity: to what extent the design and conduction of the study is likely to prevent systematic errors; and External Validity: to what extent the effects observed in the study can be generalized. 5.1. Construct Validity Terminology and its validation. We discuss the (i) choice of terminology and (ii) verification of relevant studies selection [35]. Since the search for primary studies is based on the search string, each SLR is likely to miss relevant studies if this string is not properly chosen. In order to minimize this threat, we applied the following steps: (1) derive the search string using the PICO structure; and (2) validate the search string. We carried out a manual search through all studies published from 2010 to 2012 in two specific journals, i.e. Computers & Operations Research and Information and Software Technology. Then, we conducted a search aiming at verifying if the manually selected studies were retrieved. Since we found the same valid studies, we considered the search string appropriate for our research (See Section 2.4). Completeness. Our research is limited to the databases mentioned in Section 2.3. In other words, we cannot claim that we included all relevant studies to our research question. However, the database choice considered our knowledge of important venues. As far as we know the chosen databases have the most relevant publications. Therefore, despite the fact that we are not able to guarantee completeness, we believe that the selected studies represent a good coverage. Grey literature. Besides database limitations, we excluded all grey literature (e.g. technical reports, some workshop reports, and work in progress) that may presents SBPM results. The main reason is that we were not able to devise a good solution for assessing the quality of grey literature and therefore we decided to leave the evaluation of these studies as future research. We believe that excluding these studies did not impact the overall results obtained as they normally do not present significant SBPM results. To sum up, we expect that the countermeasures taken to minimize threats to Construct Validity were enough to maximize the number and quality of relevant studies in our research. 5.2. Reliability Some countermeasures were taken to reduce the threats to Reliability. Since we followed Kitchenham s [51] procedures (i.e. we defined a research question, the selection process, inclusion/exclusion criteria and quality criteria), we believe that the reliability threats were minimized. However, adoption of systematic procedures in itself does not guarantee reliability. Therefore, we will discuss Reliability in terms of classification and analysis of studies (i.e. inclusion/exclusion criteria and data extraction) and review conduction. Classification and analysis of studies. The way in which classification and analysis of the studies are done in SLRs is a threat to reliability. Mainly because they are based on the reviewer s knowledge and experience. The steps one and two of this study suffer from this threat due to the fact that the titles and abstracts do not always have the most relevant information (i.e. relevant information is sometimes omitted) [16] and steps three and four are subject to the perception of the reviewers. In order to minimize this threat, the obtained results were discussed with the TL and supervisors. The goal was to clarify doubtful points and reach a consensus. Review conduction. The reliability of this study was improved by the fact that the first author participated in all the steps and review sections, either as a reviewer or as a supporter in 19
the decisions of inclusion, exclusion and quality criteria. The participation of the first author and also the supervisors ensured a series of countermeasures related to possible threats. We expect that, since we reported all the procedures taken and limitations, this research has high reliability and can be replicated by other researchers. 5.3. Internal Validity We will discuss Internal Validity in terms of the member s roles, reviewers bias, and publication bias. In our study, as mentioned in Section 2.2, we defined specific roles to each member. Important roles include the supervisors and the TL who were responsible to review how the study had been conducted and guarantee that the review process was correctly followed. This participation helps controlling bias in the results. Reviewer bias is another potential threat to Internal Validity as the extracted data has a qualitative nature. In order to mitigate this threat, the concepts and their attributes were grouped in order to facilitate the studies classification and comparison. The publication bias refers to the probability of publication of more positive results than negative ones [52]. While we cannot fully exclude the possibility of this threat, we reduced it to the extent possible as we followed a formal search strategy (described in the protocol) to find the entire population of publications including the negative results. 5.4. External Validity The results of this SLR were considered with respect to specific studies in the SBPM domain. Therefore, the conclusions and classifications were valid only in this given context. The results of our current study were drawn from qualitative analysis and can serve as a starting point for future researches. Additional studies can be analyzed accordingly. 6. Conclusion Overall, this study provides an up-to-date view of the SBPM area, showing common characteristics and results of the SBPM studies. This paper identified 52 studies that describe the concepts used to model the software project scheduling and staffing problems. In addition, it identified the optimization techniques and the evaluation strategies. As we stated in the Introduction, there is a trend in the SBSE area, specifically in the SBPM area, to invest in the optimization algorithms and approaches. New techniques and approaches as, for example, Multi-objective Evolutionary Algorithms are been used in recent works. We did not observe a substantial investment in producing models that resemble more closely those in software development organizations. In the analyzed papers, there is a clear focus on the basic concepts (task, resource and skill) and their attributes. We did not observe a suitable investigation regarding human, organizational and process factors that affects a project development. As discussed in Section 3.2, most instances are randomly generated, not reflecting real data. There are studies that have practical application but they do not present a practical validation of the proposed approach. A more serious problem are the studies that do not reflect the practical aspects of their corresponding area. An important contribution of our research was to collect evidence that most of the SBPM studies presents both problems. We also identified some weaknesses of the SBPM reports, discussed in Section 4.2. From that, we compiled some guidelines as to how SBPM results should be reported. Overall, SLRs with higher value to industry and academy will be possible if researches follow more disciplined procedures with the correspondent reflection in the studies. 20
Acknowledgments The authors would like to thank... References [1] Tarek Abdel-Hamid and Stuart E. Madnick. Software Project Dynamics: An Integrated Approach. Prentice-Hall, Inc., 1991. [2] Wasif Afzal, Richard Torkar, and Robert Feldt. A systematic review of search-based testing for non-functional system properties. Information and Software Technology, 51(6):957 976, 2009. [3] Enrique Alba and J. Francisco Chicano. Software project management with GAs. Information Sciences, 177(11):2380 2401, 2007. [4] S. Ali, L.C. Briand, H. Hemmati, and R.K. Panesar-Walawege. A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation. Software Engineering, IEEE Transactions on, 36(6):742 762, 2010. [5] Agile Alliance. Agile Manifesto. http://www.agilealliance.org/the-alliance/the-agile-manifesto/, 2001. [Online; accessed 1st December 2013]. [6] G. Antoniol, M. Di Penta, and M. Harman. A robust search-based approach to project management in the presence of abandonment, rework, error and uncertainty. In 10th International Symposium on Software Metrics, pages 172 183, 2004. [7] G. Antoniol, M. Di Penta, and M. Harman. Search-based techniques applied to optimization of project planning for a massive maintenance project. In Proceedings of the 21st IEEE International Conference on Software Maintenance, pages 240 249, 2005. [8] Elaine Barnett-Page and James Thomas. Methods for the synthesis of qualitative research: a critical review. BMC Medical Research Methodology, 9(59), August 2009. [9] Ahilton Barreto, Márcio de O. Barros, and Cláudia M.L. Werner. Staffing a software project: A constraint satisfaction and optimization-based approach. Computers & Operations Research, 35(10):3073 3089, 2008. [10] Odile Bellenguez and Emmanuel Néron. Lower Bounds for the Multi-skill Project Scheduling Problem with Hierarchical Levels of Skills. In Proceedings of the 5th International Conference on Practice and Theory of Automated Timetabling, PATAT 04, pages 229 243, Pittsburgh, PA, 2005. [11] Barry W. Boehm, Chris Abts, A. Winsor Brown, Sunita Chulani, Bradford K. Clark, Ellis Horowitz, Ray Madachy, Donald J. Reifer, and Bert Steece. Software Cost Estimation with Cocomo II. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2000. [12] B.W. Boehm and R. Ross. Theory-W software project management principles and examples. IEEE Transactions on Software Engineering, 15(7):902 916, 1989. [13] Camila Loiola Brito Maia, Thiago Ferreira do Nascimento, Fabrício Gomes Freitas, and Jerffeson Teixeira Souza. An Evolutionary Optimization Approach to Software Test Case Allocation. In Proceedings of the International Conference on Computational Intelligence and Information Technology, pages 637 641, 2011. [14] R. Britto, P.S. Neto, R. Rabelo, W. Ayala, and T. Soares. A hybrid approach to solve the agile team allocation problem. In 2012 IEEE Congress on Evolutionary Computation (CEC), pages 1 8, 2012. [15] Frederick P. Brooks, Jr. No Silver Bullet Essence and Accidents of Software Engineering. Computer, 20:10 19, April 1987. [16] David Budgen, Barbara A. Kitchenham, Stuart M. Charters, Mark Turner, Pearl Brereton, and Stephen G. Linkman. Presenting software engineering results using structured abstracts: a randomised experiment. Empirical Software Engineering, 13:435 468, August 2008. [17] Robson C. Real World Research. Blackwell, 2nd edition edition, 2002. [18] Nurcin Celik, Seungho Lee, Esfandyar Mazhari, Young-Jun Son, Robin Lemaire, and Keith G. Provan. Simulation-based workforce assignment in a multi-organizational social network for alliance-based software development. Simulation Modelling Practice and Theory, 19(10):2169 2188, 2011. [19] Carl K. Chang, Mark J. Christensen, and Tao Zhang. Genetic algorithms for project management. Annals of Software Engineering, 11(1):107 139, 2001. [20] Carl K. Chang, Hsin-yi Jiang, Yu Di, Dan Zhu, and Yujia Ge. Time-line based model for software project scheduling with genetic algorithms. Information and Software Technology, 50(11):1142 1154, oct 2008. [21] CarlK. Chang and Yujia Ge. Genetic algorithm techniques and applications in management systems. In Cornelius T. Leondes, editor, Intelligent Knowledge-Based Systems, pages 1718 1738. 2005. 21
[22] Wei-Neng Chen and Jun Zhang. Ant Colony Optimization for Software Project Scheduling and Staffing with an Event-Based Scheduler. IEEE Transactions on Software Engineering, 39(1):1 17, 2013. [23] Francisco Chicano, Alejandro Cervantes, Francisco Luna, and Gustavo Recio. A Novel Multiobjective Formulation of the Robust Software Project Scheduling Problem. In Proceedings of the 2012 European Conference on Applications of Evolutionary Computation, EvoApplications 12, pages 497 507, Málaga, Spain, 2012. [24] Francisco Chicano, Francisco Luna, Antonio J. Nebro, and Enrique Alba. Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 11, pages 1915 1922, Dublin, Ireland, 2011. ACM. [25] Thelma Elita Colanzi, Silvia Regina Vergilio, Wesley Klewerton Guez Assun ao, and Aurora Pozo. Search Based Software Engineering: Review and analysis of the field in Brazil. Journal of Systems and Software, 86(4):970 984, 2013. [26] Ken W. Collier and James S. Collofello. A Design-based Model for the Reduction of Software Cycle Time. Software Process: Improvement and Practice, 2(3):167 179, 1996. [27] Microsoft Corporation. Microsoft Office Project. http://office.microsoft.com/project, 2007. [Online; accessed 10th May 2013]. [28] M. Di Penta, M. Harman, G. Antoniol, and F. Qureshi. The Effect of Communication Overhead on Software Maintenance Project Staffing: a Search-Based Approach. In IEEE International Conference on Software Maintenance, pages 315 324, 2007. [29] Massimiliano Di Penta, Mark Harman, and Giuliano Antoniol. The use of search-based optimization techniques to schedule and staff software projects: an approach and an empirical study. Software: Practice and Experience, 41(5):495 519, 2011. [30] Supraja Doma, Larry Gottschalk, Tetsutaro Uehara, and Jigang Liu. Resource Allocation Optimization for GSD Projects. In Proceedings of the International Conference on Computational Science and Its Applications: Part II, ICCSA 09, pages 13 28, Seoul, Korea, 2009. [31] J. Duggan, J. Byrne, and G.J. Lyons. A task allocation optimizer for software construction. IEEE Software, 21(3):76 82, 2004. [32] Tore Dybå and Torgeir Dingsøyr. Strength of evidence in systematic reviews in software engineering. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement, ESEM 08, pages 178 187, Kaiserslautern, Germany, 2008. ACM. [33] Tore Dybå. An Empirical Investigation of the Key Factors for Success in Software Process Improvement. IEEE Transactions on Software Engineering, 31(5):410 424, May 2005. [34] Tore Dybå and Torgeir Dingsøyr. Empirical studies of agile software development: A systematic review. Information and Software Technology, 50(9-10):833 859, 2008. [35] Emelie Engström, Per Runeson, and Mats Skoglund. A systematic review on regression test selection techniques. Information and Software Technology, 52:14 30, January 2010. [36] Yujia Ge. Software Project Rescheduling with Genetic Algorithms. In International Conference on Artificial Intelligence and Computational Intelligence, volume 1, pages 439 443, 2009. [37] T. Gonsalves, A. Ito, R. Kawabata, and K. Itoh. Swarm Intelligence in the Optimization of Software Development Project Schedule. In 32nd Annual IEEE International Computer Software and Applications, pages 587 592, 2008. [38] Stefan Gueorguiev, Mark Harman, and Giuliano Antoniol. Software Project Planning for Robustness and Completion Time in the Presence of Uncertainty Using Multi Objective Search Based Software Engineering. In Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO 09, pages 1673 1680, Montréal, Québec, Canada, 2009. ACM. [39] Thomas Hanne and Stefan Nickel. A multiobjective evolutionary algorithm for scheduling and inspection planning in software development projects. European Journal of Operational Research, 167(3):663 678, 2005. [40] Maciej Hapke, Andrzej Jaszkiewicz, and Roman Slowinski. Fuzzy project scheduling system for software development. Fuzzy Sets and Systems, 67(1):101 117, 1994. [41] Mark Harman. The Current State and Future of Search Based Software Engineering. In 2007 Future of Software Engineering, FOSE 07, pages 342 357, Minneapolis, MN, USA, 2007. [42] Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search Based Software Engineering: A Comprehensive Analysis and Review of Trends Techniques and Applications. Technical Report TR-09-03, King s College, London, United Kingdom, 2009. [43] Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. Search-based Software Engineering: Trends, Techniques and Applications. ACM Computing Surveys, 45(1):11:1 11:61, December 2012. [44] Martin Höst and Per Runeson. Checklists for Software Engineering Case Study Research. In Proceedings of the 22
First International Symposium on Empirical Software Engineering and Measurement, ESEM 07, pages 479 481, Madrid, Spain, 2007. IEEE Computer Society. [45] Wei Huang and Lixin Ding. Project-scheduling problem with random time-dependent activity duration times. IEEE Transactions on Engineering Management, 58(2):377 387, 2011. [46] Wei Huang, Lixin Ding, Bin Wen, and Buqing Cao. Project Scheduling Problem for Software Development with Random Fuzzy Activity Duration Times. In International Symposium on Neural Networks, volume 5552, pages 60 69, Wuhan, China, 2009. [47] Project Management Institute. A Guide to the Project Management Body of Knowledge (PMBOK R Guide). Project Management Institute, 5th edition, 2013. [48] Pankaj Jalote and Gourav Jain. Assigning tasks in a 24-h software development model. Journal of Systems and Software, 79(7):904 911, 2006. [49] Dongwon Kang, Jinhwan Jung, and Doo-Hwan Bae. Constraint-based human resource allocation in software projects. Software: Practice and Experience, 41(5):551 577, 2011. [50] Puneet Kapur, An Ngo-The, Gnther Ruhe, and Andrew Smith. Optimized staffing for product releases and its application at Chartwell Technology. Journal of Software Maintenance and Evolution: Research and Practice, 20(5):365 386, 2008. [51] Barbara Kitchenham. Procedures for Performing Systematic Reviews. Technical Report TR/SE -0401, Software Engineering Group, Department of Computer Science, Keele University, 2004. [52] Barbara Kitchenham and Stuart Charters. Guidelines for performing Systematic Literature Reviews in Software Engineering, version 2.3. Technical Report EBSE 2007-01, Software Engineering Group, School of Computer Science and Mathematics, Keele University and Department of Computer Science University of Durham, 2007. [53] Barbara Kitchenham, Lesley Pickard, and Shari Lawrence Pfleeger. Case Studies for Method and Tool Evaluation. IEEE Software, 12:52 62, July 1995. [54] Barbara A. Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg. Preliminary Guidelines for Empirical Research in Software Engineering. IEEE Transactions on Software Engineering, 28:721 734, August 2002. [55] Rainer Kolisch and Christian Heimerl. An efficient metaheuristic for integrated scheduling and staffing IT projects based on a generalized minimum cost flow network. Naval Research Logistics (NRL), 59(2):111 127, 2012. [56] Daniela C. C. Kupsch. SPIAL: A Tool for Software Process Improvement Training. PhD thesis, Federal University of Minas Gerais, Brazil, 2012. [57] Chen Li, Marjan Akker, Sjaak Brinkkemper, and Guido Diepen. An integrated approach for requirement selection and scheduling in software release planning. Requirements Engineering, 15(4):375 396, 2010. [58] Feng Li, HongBin Lin, ShaoChun Li, ChangJie Guo, and Xinyu Zhao. Self-adapting task allocation approach for software outsourcing services. In 2012 IEEE International Conference on Service Operations and Logistics, and Informatics, pages 479 484, 2012. [59] Javier Matos and Enrique Alba. Benchmarking CHC on a New Application: The Software Project Scheduling Problem. In Proceedings of the 12th International Conference on Parallel Problem Solving from Nature - Volume Part II, PPSN 12, pages 448 457, Taormina, Italy, 2012. [60] Leandro L. Minku, Dirk Sudholt, and Xin Yao. Evolutionary Algorithms for the Project Scheduling Problem: Runtime Analysis and Improved Design. In Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, GECCO 12, pages 1221 1228, Philadelphia, Pennsylvania, USA, 2012. ACM. [61] Emily O. Navarro. SimSE: A Software Engineering Simulation Environment for Software Process Education. PhD thesis, Donald Bren School of Information and Computer Sciences, University of California, Irvine, 2006. [62] An Ngo-The and G. Ruhe. Optimized Resource Allocation for Software Release Planning. IEEE Transactions on Software Engineering, 35(1):109 123, 2009. [63] OpenProj. OpenProj - Project Management. http://sourceforge.net/projects/openproj/, 2012. [Online; accessed 10th May 2013]. [64] Linet Özdamar and Ebru Alanya. Uncertainty modelling in software development projects (with case study). Annals of Operations Research, 102(1-4):157 178, 2001. [65] F. Padberg and D. Weiss. Optimal Scheduling of Software Projects Using Reinforcement Learning. In 2011 18th Asia Pacific Software Engineering Conference (APSEC), pages 9 16, 2011. [66] Frank Padberg. A study on optimal scheduling for software projects. Software Process: Improvement and Practice, 11(1):77 91, 2006. [67] Kai Petersen and Claes Wohlin. Context in Industrial Software Engineering Research. In Proceedings of the 2009 23
3rd International Symposium on Empirical Software Engineering and Measurement, ESEM 09, pages 401 404, Lake Buena Vista, Florida, USA, 2009. IEEE Computer Society. [68] GNOME Project. Gnome planner. https://live.gnome.org/planner, 2010. [Online; accessed 10th May 2013]. [69] Jian Ren, Mark Harman, and Massimiliano Di Penta. Cooperative Co-evolutionary Optimization of Software Project Staff Assignments and Job Scheduling. In Proceedings of the Third International Conference on Search Based Software Engineering, SSBSE 11, pages 127 141, 2011. [70] Franz Rothlauf. Design of Modern Heuristics: Principles and Application. Springer Berlin Heidelberg, 2011. [71] Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14:131 164, April 2009. [72] U.Z. Sanal. A decision support system for fuzzy scheduling of software projects. In 2000 IEEE AUTOTESTCON Proceedings, pages 263 272, 2000. [73] Xiaohong Shan, Guorui Jiang, and Tiyun Huang. The Optimization Research on the Human Resource Allocation Planning in Software Projects. In International Conference on Management and Service Science, pages 1 4, 2010. [74] Forrest Shull, Janice Singer, and Dag I. K. Sjøberg. Guide to Advanced Empirical Software Engineering. Springer- Verlag, London, 2008. [75] C. Stylianou, S. Gerasimou, and A.S. Andreou. A Novel Prototype Tool for Intelligent Software Project Scheduling and Staffing Enhanced with Personality Factors. In IEEE 24th International Conference on Tools with Artificial Intelligence, volume 1, pages 277 284, 2012. [76] Constantinos Stylianou and Andreas S. Andreou. Intelligent Software Project Scheduling and Team Staffing with Genetic Algorithms. In Proceedings of the 7th IFIP Conference on Artificial Intelligence Applications and Innovations, pages 169 178, Corfu, Greece, September 2011. [77] Ákos Szöke. Decision Support for Iteration Scheduling in Agile Environments. In International Conference on Product-Focused Software Process Improvement, pages 156 170, Oulu, Finland, 2009. [78] Hsien-Tang Tsai, Herbert Moskowitz, and Lai-Hsi Lee. Human resource selection for software development projects using Taguchi s parameter design. European Journal of Operational Research, 151(1):167 180, 2003. [79] Gursimran Singh Walia and Jeffrey C. Carver. A systematic literature review to identify and classify software requirement errors. Information and Software Technology, 51:1087 1109, July 2009. [80] Rihong Wang, Ying Xu, and Dandan Xin. Research of Multi-Resource Constrained Project Scheduling base on the genetic algorithm. In International Conference on Computer Science and Service System, pages 2448 2450, 2011. [81] Byron J. Williams and Jeffrey C. Carver. Characterizing software architecture changes: A systematic review. Information and Software Technology, 52:31 51, January 2010. [82] Claes Wohlin, Per Runeson, Paulo Anselmo da Mota Silveira Neto, Emelie Engström, Ivan do Carmo Machado, and Eduardo Santana de Almeida. On the reliability of mapping studies in software engineering. Journal of Systems and Software, 86(10):2594 2610, 2013. [83] Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Bjöorn Regnell, and Anders Wesslén. Experimentation in software engineering: an introduction. Kluwer Academic Publishers, Norwell, MA, USA, 2000. [84] Jing Xiao, Xian-Ting Ao, and Yong Tang. Solving software project scheduling problems with ant colony optimization. Computers & Operations Research, 40(1):33 46, 2013. [85] Junchao Xiao and W. Afzal. Search-based Resource Scheduling for Bug Fixing Tasks. In Second International Symposium on Search Based Software Engineering, pages 133 142, 2010. [86] Junchao Xiao, Leon J. Osterweil, Jie Chen, Qing Wang, and Mingshu Li. Search Based Risk Mitigation Planning in Project Portfolio Management. In Proceedings of the 2013 International Conference on Software and System Process, ICSSP 2013, pages 146 155, San Francisco, CA, USA, 2013. ACM. [87] Junchao Xiao, Leon J. Osterweil, Qing Wang, and Mingshu Li. Disruption-driven Resource Rescheduling in Software Development Processes. In Proceedings of the 2010 International Conference on Software Process, ICSP 10, pages 234 247, Paderborn, Germany, 2010. [88] Junchao Xiao, Qing Wang, Mingshu Li, Qiusong Yang, Lizi Xie, and Dapeng Liu. Value-Based Multiple Software Projects Scheduling with Genetic Algorithm. In International Conference on Software Process, ICSP 2009, pages 50 62, Vancouver, Canada, 2009. [89] Virginia Yannibelli and Analía Amandi. A knowledge-based evolutionary assistant to software development project scheduling. Expert Systems with Applications, 38(7):8403 8413, 2011. 24
Appendix A. Studies The table bellow contains a list of the articles that were selected from the literature review. Table A.13: Selected primary studies Study ID Digital Library Title Year Source Reference S1 ACM Search Based Risk Mitigation Planning in 2013 International Conference [86] Project Portfolio Management on Software and System Process S2 ACM Evolutionary Algorithms for the Project 2012 International Conference [60] Scheduling Problem: Runtime Analysis and on Genetic and Evolutionary Improved Design Computation Conference S3 ACM Using Multi-objective Metaheuristics to 2011 International Conference [24] Solve the Software Project Scheduling Problelutionary on Genetic and Evo- Computation Conference S4 ACM Software Project Planning for Robustness 2009 International Conference [38] and Completion Time in the Presence of on Genetic and Evolutionary Uncertainty Using Multi Objective Search Computation Based Software Engineering Conference S5 Wiley Inter- A Design-based Model for the Reduction of 1996 Software Process: Improvement [26] Science Software Cycle Time and Practice S6 Wiley Inter- Science S7 Wiley Inter- Science S8 Wiley Inter- Science S9 Wiley Inter- Science The use of search-based optimization techniques to schedule and staff software projects: an approach and an empirical study Constraint-based human resource allocation in software projects Optimized staffing for product releases and its application at Chartwell Technology An efficient metaheuristic for integrated scheduling and staffing IT projects based on a generalized minimum cost flow network A study on optimal scheduling for software projects 2011 Software Process: Improvement and Practice 2011 Software: Practice and Experience 2008 Journal of Software Maintenance and Evolution: Research and Practice 2012 Naval Research Logistics (NRL) S10 Wiley Inter- 2006 Software Process: Improvement [66] Science and Practice S11 Elsevier Software project management with GAs 2007 Information Sciences [3] S12 Elsevier Staffing a software project: A constraint satisfaction 2008 Computers & Operations [9] and optimization-based approach Research S13 Elsevier Simulation-based workforce assignment in 2011 Simulation Modelling [18] a multi-organizational social network for Practice and Theory alliance-based software development S14 Elsevier Time-line based model for software project scheduling with genetic algorithms S15 Elsevier A multiobjective evolutionary algorithm for scheduling and inspection planning in software development projects S16 Elsevier Fuzzy project scheduling system for software development S17 Elsevier Assigning tasks in a 24-h software development model S18 Elsevier Human resource selection for software development projects using Taguchi s parameter design S19 Elsevier Solving software project scheduling problems with ant colony optimization S20 Elsevier A knowledge-based evolutionary assistant to software development project scheduling S21 SpringerLink Cooperative Co-evolutionary Optimization of Software Project Staff Assignments and Job Scheduling S22 SpringerLink Intelligent Software Project Scheduling and Team Staffing with Genetic Algorithms S23 SpringerLink An Evolutionary Optimization Approach to Software Test Case Allocation 2008 Information and Software Technology 2005 European Journal of Operational Research [29] [49] [50] [55] [20] [39] 1994 Fuzzy Sets and Systems [40] 2006 Journal of Systems and Software 2003 European Journal of Operational Research 2013 Computers & Operations Research 2011 Expert Systems with Applications 2011 International Conference on Search Based Software Engineering 2011 IFIP Conference on Artificial Intelligence Applications and Innovations 2011 International Conference on Computational Intelligence and Information Technology [48] [78] [84] [89] [69] [76] [13] 25
Study ID Digital Library Title Year Source Reference S24 SpringerLink Genetic Algorithm Techniques and Applications 2005 Intelligent Knowledge- [21] in Management Systems Based Systems S25 SpringerLink Value-Based Multiple Software Projects 2009 International Conference [88] Scheduling with Genetic Algorithm on Software Process S26 SpringerLink Uncertainty Modelling in Software Development 2001 Annals of Operations Re- [64] Projects (With Case Study) search S27 SpringerLink Genetic Algorithms for Project Management 2001 Annals of Software Engineering [19] S28 SpringerLink Decision Support for Iteration Scheduling in 2009 Conference on Product- [77] Agile Environments Focused Software Process Improvement S29 SpringerLink Disruption-driven Resource Rescheduling in 2010 International Conference [87] Software Development Processes on Software Process S30 SpringerLink An integrated approach for requirement selection and scheduling in software release planning 2010 Requirements Engineering [57] S31 SpringerLink Lower Bounds for the Multi-skill Project Scheduling Problem with Hierarchical Levels of Skills S32 SpringerLink Resource Allocation Optimization for GSD Projects S33 SpringerLink Project Scheduling Problem for Software Development with Random Fuzzy Activity Duration Times S34 SpringerLink Benchmarking CHC on a New Application: The Software Project Scheduling Problem S35 SpringerLink A Novel Multiobjective Formulation of the Robust Software Project Scheduling Problem S36 IEEExplore A Novel Prototype Tool for Intelligent Software Project Scheduling and Staffing Enhanced with Personality Factors S37 IEEExplore Ant Colony Optimization for Software Project Scheduling and Staffing with an Event-Based Scheduler S38 IEEExplore The Effect of Communication Overhead on Software Maintenance Project Staffing: a Search-Based Approach S39 IEEExplore Search-based Resource Scheduling for Bug Fixing Tasks S40 IEEExplore Swarm Intelligence in the Optimization of Software Development Project Schedule S41 IEEExplore Research of Multi-Resource Constrained Project Scheduling base on the genetic algorithm S42 IEEExplore The Optimization Research on the Human Resource Allocation Planning in Software Projects S43 IEEExplore Search-based techniques applied to optimization of project planning for a massive maintenance project S44 IEEExplore A decision support system for fuzzy scheduling of software projects S45 IEEExplore A robust search-based approach to project management in the presence of abandonment, rework, error and uncertainty S46 IEEExplore Optimized Resource Allocation for Software Release Planning S47 IEEExplore A task allocation optimizer for software construction S48 IEEExplore Optimal Scheduling of Software Projects Using Reinforcement Learning S49 IEEExplore Software Project Rescheduling with Genetic Algorithms S50 IEEExplore Self-adapting task allocation approach for software outsourcing services S51 IEEExplore Project-Scheduling Problem With Random Time-Dependent Activity Duration Times S52 IEEExplore A hybrid approach to solve the agile team allocation problem 2004 International Conference on Practice and Theory of Automated Timetabling 2009 International Conference on Computational Science and Its Applications: Part II 2009 International Symposium on Neural Networks 2012 International Conference on Parallel Problem Solving from Nature 2012 European Conference on Applications of Evolutionary Computation 2012 International Conference on Tools with Artificial Intelligence 2013 IEEE Transactions on Software Engineering 2007 International Conference on Software Maintenance 2010 International Symposium on Search Based Software Engineering 2007 International Computer Software and Applications 2011 International Conference on Computer Science and Service System 2010 International Conference on Management and Service Science 2005 International Conference on Software Maintenance 2000 IEEE AUTOTESTCON Proceedings 2004 International Symposium on Software Metrics [10] [30] [46] [59] [23] [75] [22] [28] [85] [37] [80] [73] [7] [72] 2009 IEEE Transactions on [62] Software Engineering 2004 IEEE Software [31] 2011 Asia Pacific Software Engineering Conference 2009 International Conference on Artificial Intelligence and Computational Intelligence 2012 International Conference on Service Operations and Logistics, and Informatics 2011 IEEE Transactions on Engineering Management 2012 Congress on Evolutionary Computation (CEC) [6] [65] [36] [58] [45] [14] 26