Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

Usng Mult-obectve Metaheurstcs to Solve the Software Proect Schedulng Problem Francsco Chcano Unversty of Málaga, Span chcano@lcc.uma.es Francsco Luna Unversty of Málaga, Span flv@lcc.uma.es Enrque Alba Unversty of Málaga, Span eat@lcc.uma.es Antono J. Nebro Unversty of Málaga, Span antono@lcc.uma.es ABSTRACT The Software Proect Schedulng (SPS) problem relates to the decson of who does what durng a software proect lfetme. Ths problem has a captal mportance for software companes. In the SPS problem, the total budget and human resources nvolved n software development must be optmallymanagednordertoendupwthasuccessfulproect. Companes are manly concerned wth reducng both the duraton and the cost of the proects, and these two goals are n conflct wth each other. A mult-obectve approach s therefore the natural way of facng the SPS problem. In ths paper, a number of mult-obectve metaheurstcs have been used to address ths problem. They have been thoroughly compared over a set of 36 publcly avalable nstances that cover a wde range of dfferent scenaros. The resultng proect schedulngs of the algorthms have been analyzed n order to show ther relevant features. The algorthms used n ths paper and the analyss performed may assst proect managers n the dffcult task of decdng who does what n a software proect. Categores and Subect Descrptors I.2.8 [Artfcal Intellgence]: Problem Solvng, Control Methods, and Search; D.2.9 [Software Engneerng]: Management General Terms Expermentaton, Algorthms Keywords Software proect schedulng, mult-obectve optmzaton, metaheurstcs Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. GECCO 11, July 12 16, 2011, Dubln, Ireland. Copyrght 2011 ACM 978-1-4503-0557-0/11/07...$10.00. 1. INTRODUCTION The hgh complexty of current software proects ustfes the research nto computer aded tools to properly schedule the proect development. In ths paper we focus on the assgnment of employees to tasks n a software proect so as to reduce both the proect cost and duraton. Ths problem s known as the Software Proect Schedulng (SPS) problem [1]. The SPS problem s mult-obectve n nature and t has been formulated as so, rather than aggregatng ts obectves (mnmzng both the proect cost and ts duraton) nto one sngle value. Contrary to sngle-obectve optmzaton, the soluton of a mult-obectve problem such as SPS s not one sngle soluton, but a set of nondomnated solutons known as the Pareto optmal set, whch s called Pareto border or Pareto front when t s plotted n the obectve space [5]. Whatever soluton of ths set s optmal n the sense that no mprovement can be reached on an obectve wthout worsenng at least another one at the same tme. That s, n the context of the SPS problem, t s not possble to reduce the proect cost wthout ncreasng ts duraton (or vce versa). The man goal n the resoluton of a mult-obectve problem s to compute the set of solutons wthn the Pareto optmal set and, consequently, the Pareto front. Many metaheurstcs have been proposed n the lterature to deal wth multobectve problems [4]. Indeed, the most well-known algorthms n the mult-obectve communty fall nto ths knd of search technque. In ths paper, three classcal methods NSGA-II [6], [18] and [12] plus two recent algorthms [16] and [14] have been used to address the SPS problem. The exstng work on ths topc usually proposes a metaheurstc algorthm for solvng a specfc flavour of the problem and presents the results of some expermental evaluaton over a set of problem nstances. In ths work, however, we want to answer some open questons that have not been addressed yet n prevous works. These research questons are: RQ1: How do these fve metaheurstcs perform when solvng the SPS problem? To answer ths queston, a thorough comparson between NSGA-II,,,, and has been performed over a set of 36 SPS nstances coverng dfferent proect scenaros wth dfferent levels of dffculty. To the best of our knowledge,,, and are used on the SPS problem for the frst tme. NSGA-II and are the two most wdely used mult-obectve algorthms n the lterature. 1915

RQ2: Whch are the features of the solutons reached by the metaheurstc algorthms? In ths case we analyze the solutons of the dfferent algorthms to fnd correlatons between ther features and the regon n the obectve space where these solutons are located n. In other words, we want to know what metaheurstc algorthms do to obtan a soluton wth some concrete values for the obectve functons,.e., the characterstcs of the resultng proect schedulngs. The paper s organzed as follows. The next secton presents some related work on the topc of ths paper. Secton 3 descrbes the detals of the schedulng problem addressed. In Secton 4 the algorthms used to solve the problem are explaned. After that, we present the results of some experments n Secton 5. Fnally, we summarze our conclusons and outlne some lnes of future work n Secton 6. 2. RELATED WORK Not much work has been devoted to the mult-obectve approach of the SPS problem up to now. Even though few related papers exst, they are usually targeted to solvng dfferent flavours of the problem. Guorguev et al. [9] solve the problem of fndng the optmal assgnment of workpackages to developer teams wth the obectves of mnmzng proect tme and maxmzng robustness. Ths problem was prevously solved usng a sngle-obectve approach n [2]. The robustness of a soluton s measured as the varaton of the total development tme of the proect under unexpected events lke newly added tasks to perform or changes n the duraton of one task. The algorthm used for the experments s. Duggan et al. [7] solve the problem of task allocaton n a software proect. The effort of the tasks s measured as unts of complexty (UOC). The authors defne some levels of skll for the staff: novce, average, expert, etc. Each skll level determnes the UOC per days that the engneer s able to perform and the number of errors per UOC that s/he ntroduces. For each par engneer-task the proect manager must assgn a skll level. One soluton to the problem s a schedule n whch each engneer s assgned to a task n each tme. The obectves are to mnmze the development tme and the number of errors. The algorthm used to solve the problem s an eltst verson of NSGA. Hanne and Nckel [10] solve the problem of plannng nspectons and other actvtes n a software proect. In ther approach, a set of developers have to program some source code tems. These source code tems must also be nspected by the developer team fulfllng some constrants (for example, the author of a source code tem must not be one of ther nspectors). The mult-obectve approach tres to mnmze tme, cost, and number of defects. Ths multobectve problem s solved by usng a mult-obectve evolutonary algorthm. 3. SOFTWARE PROJECT SCHEDULING We follow here the same formulaton proposed n [1]. Thus, the resources consdered are people wth a set of sklls and a salary. These employees have a maxmum degree of dedcaton to the proect. Formally, each person (employee) s denoted wth e, where goes from 1 to E (the number of employees). Let SK be the set of sklls, and s the -th skll wth varyng from 1 to S = SK. The sklls of the employee e wll be denoted wth e sklls SK, the monthly salary wth e salary, and the maxmum dedcaton to the proect wth e maxded. The salary and the maxmum dedcaton are real numbers. The former s expressed n abstract currency unts, whle the latter s the fracton of tme spent on the proect. The tasks are denoted wth t,where goes from 1 to T (the number of tasks). Each task t has a set of requred sklls assocated wth t, that we denote wth t sklls, plus an effort t effor t, expressed n person-month (PM). The tasks must be performed accordng to a Task Precedence Graph (TPG). It ndcates whch tasks must be completed before a new task s started. The TPG s an acyclc drected graph G(V,A) wthavertexsetv = {t 1,t 2,...,t T } and an arc set A, where(t,t ) A f the task t must be completed, wth no other ntervenng tasks, before task t can start. The obectves of the SPS problem are to mnmze the cost and the duraton of the proect. The constrants are (1) that each task must be performed by at least one person, (2) the set of requred sklls of a task must be ncluded n the unon of the sklls of the employees performng the task, and (3) no employee must exceed her/hs maxmum dedcaton to the proect. Once we know the elements of a problem nstance, we can proceed to descrbe the elements of a soluton to the problem. A soluton can be represented wth a matrx X =(x ) of sze E T,wherex 0. The element x s the degree of dedcaton of the employee e to the task t. If the employee e performs the task t wth a 0.5 dedcaton degree s/he spends half of her/hs tme n the company on the task. If an employee does not perform a task s/he wll have a dedcaton degree of 0.0 for that task. Ths nformaton helps to compute the duraton of each task and, ndeed, the start and the end tme of each one,.e., the tme schedule of the tasks (Gantt dagram). From ths schedule we can compute the duraton of the proect (see Fgure 1). The cost can be calculated after the duraton of the tasks by takng nto account the dedcaton and the salary of the employees. Fnally, the overwork of each employee can be calculated usng thetmescheduleofthetasksandthededcatonmatrxx. Fgure 1: A tentatve soluton for a sample proect. Usng the task duratons and the TPG, the Gantt dagram of the proect can be computed. 1916

In order to evaluate the qualty of a gven soluton, we take nto account three ssues: proect duraton, proect cost, and soluton feasblty. To compute the proect duraton, denoted wth p dur, we need to calculate the duraton of each ndvdual task (t dur ). Ths s calculated n the followng way: t dur = teffor t t ahr where t ahr s the amount of human resources (measured n persons) spent on task t and s defned as the sum of the dedcaton degree that the employees have on the task, that s: t ahr = (1) E x (2) At ths pont we can also defne the partcpaton of employee e n the proect, e par, as the fracton of the total workload of the proect that was performed by the employee, that s: T T x e par =1 = =1 Ek=1 t effor t x k T = =1 teffor t T =1 teffor t (3) The next step s to compute the startng and endng tme for each task (t start and t end ), whch are defned accordng to the followng expressons: { 0 f t, (t,t ) A t start = max {t end } t,(t,t ) A otherwse (4) t end =1 = t start + t dur (5) At the same tme, t s possble to compute the proect duraton (p dur ), whch s the maxmum endng tme ever found: p dur = T max =1 {tend } (6) The proect cost p cost s the sum of the salares pad to the employees for ther dedcaton to the proect. These charges are computed by multplyng the salary of the employee by the tme spent on the proect. The tme spent on the proect s the sum of the dedcaton multpled by the duraton of each task. In summary: p cost = E T e salary =1 =1 x t dur (7) In order to check the valdty of a soluton we must frst check that all tasks are performed by somebody,.e., no task s left undone. That s: t ahr > 0 {1, 2,...,T} (8) The second constrant s that the employees performng the task must have the sklls requred by the task: t sklls { x >0} e sklls {1, 2,...,T} (9) Fnally, n order to compute the overwork p over we frst need to compute the workng functon for each employee, defned as: e work (τ) = x (10) { t start τ t end } Algorthm 1 Pseudocode for a generc mult-obectve EA. 1: P (0) GenerateIntalPopulaton() 2: EvaluateObectves(P (0)) 3: PF CreateParetoFront() //Create an empty front 4: t 0 5: whle not Termnaton Condton() do 6: parents Selecton(P (t)); 7: offsprng EvolutonaryOperators(parents); 8: EvaluateObectves(offsprng); 9: P (t +1) UpdatePopulaton(P (t), offsprng) 10: UpdateFront(PF,P(t +1)) 11: t t +1 12: end whle If e work (τ) >e maxded the employee e exceeds her/hs maxmum dedcaton at nstant τ. Theoverworkoftheemployee e over s: e over = τ=pdur τ=0 ramp(e work (τ) e maxded )dτ (11) where ramp s the functon defned by: { x f x>0 ramp(x) = 0 f x 0 (12) The defnte ntegral n (11) always exsts and can be easly computed because ts ntegrand s pecewse contnuous. The total overwork of the proect s then the sum of the overwork for all the employees,.e.: p over = E =1 e over (13) 4. MULTI-OBJECTIVE ALGORITHMS In ths secton, we brefly descrbe the fve metaheurstcs used n ths study, namely NSGA-II,,,, and. They all are evolutonary algorthms (EAs) [3] whch are, by far, the most popular metaheurstcs for solvng MOPs [4, 5] because of ther ablty of fndng a set of trade-off solutons n one sngle run. Compared to a sngle-obectve EA, a mult-obectve one dffers manly n how the set of non-domnated solutons s managed. The general approach s to keep an external archve, as t can be observed n the pseudocode of a generc mult-obectve EA ncluded n Algorthm 1 (the archve s referred as to PF n the algorthm). Both NSGA-II [6] and [18] use the scheme of Algorthm 1. As evolutonary operators, they have adopted bnary tournament selecton, smulated bnary crossover and polynomal mutaton [5]. They dffer one each other n the mechansm used to keep a dverse approxmated Pareto front. [13] n turn has a populaton wth one sngle soluton that t s teratvely modfed by usng polynomal mutaton only (no crossover operator s requred), but t uses an external archve. [15] s an structured EA that ncludes an external archve to store the nondomnated solutons. It has been engneered wth the same evolutonary operators as NSGA-II and. Fnally, [14] also matches the scheme of Algorthm 1, but the typcal evolutonary operators are replaced by the dfferental evoluton selecton and crossover. In order to deal wth constraned optmzaton problems such as SPS, all the algorthms have used the constrant 1917

domnaton prncple presented n [5]. The prncple states that feasble solutons are better than non-feasble ones and those non-feasble solutons wth a smaller overall constrant volaton are better (constrants are normalzed to be greater than or equal to zero). 5. EXPERIMENTS In ths secton we present the results of an emprcal study amed at answerng the research questons proposed n Secton 1. In the frst three sectons we descrbe the methodology, the detals of the parameterzaton of the algorthms, and the nstances of the problem used n the experments. In each of the last two sectons we answer the research questons n turn. For the experments, we have used Metal [8], an obectorented Java-based framework amed at the development, expermentaton, and study of metaheurstcs for solvng mult-obectve optmzaton problems. Metal s freely avalable at http://metal.sourceforge.net/. 5.1 Methodology In order to measure the performance of the mult-obectve solvers used here, the qualty of ther resultng nondomnated set of solutons has to be consdered. Two ndcators have been used for ths purpose n ths artcle: the hypervolume (HV) ndcator [19] and the attanment surfaces [11]. The HV s consdered n the feld as one of the more accurate ndcators. It calculates the volume (n the obectve space) covered by members of a nondomnated set of solutons for problems where all the obectves are to be mnmze, and t provdes a measure takng nto account the convergence and dversty of the obtaned approxmaton set. Whle the HV allows measurng the performance of dfferent algorthms to be compared, from the pont of vew of a decson maker, knowng about the HV value s not enough, because t gves no nformaton about the shape of the front. Indeed, a proect manager wants to know where the front obtaned by the dfferent algorthms s located, snce the front s what contans the nformaton about the cost vs. tme trade-off. We need a way of representng the results of a mult-obectve algorthm that allows us to observe the expected performance and ts varablty over multple runs, n the same way as the average and the standard devaton are used n the sngle-obectve case. To do ths we use the concept of emprcal attanment functon (EAF) [11]. In short, the EAF s a functon α from the obectve space R n to the nterval [0, 1] that estmates for each vector n the obectve space the probablty of beng domnated by the approxmated Pareto front of one sngle run of the mult-obectve algorthm. Gven the r approxmated Pareto fronts obtaned n the dfferent runs, the EAF s defned as: α(z) = 1 r r I(A {z}) (14) =1 where A s the -th approxmated Pareto front obtaned wth the mult-obectve algorthm and I s an ndcator functon that takes value 1 when the predcate nsde t s true, and 0 otherwse. The predcate A {z} means A domnates soluton z. Thanks to the attanment functon, t s possble to defne the concept of k%-attanment surface [11]. The attanment functon α s a scalar feld n R n and the k%-attanment surface s the level curve wth Table 1: Parameterzaton of the algorthms. L s the ndvdual length (number of tasks number of employees). NSGA-II [6] Populaton Sze 100 ndvduals Selecton of Parents bnary tournament + bnary tournament Recombnaton smulated bnary, p c =0.9 Mutaton polynomal, p m =1.0/L [18] Populaton Sze 100 ndvduals Selecton of Parents bnary tournament + bnary tournament Recombnaton smulated bnary, p c =0.9 Mutaton polynomal, p m =1.0/L [12] Populaton Sze 1 ndvdual Mutaton polynomal, p m =1.0/L Archve Sze 100 [16] Populaton Sze 100 ndvduals (10 10) Neghborhood 1-hop neghbors (8 surroundng solutons) Selecton of Parents bnary tournament + bnary tournament Recombnaton smulated bnary, p c =0.9 Mutaton polynomal, p m =1.0/L Archve Sze 100 ndvduals [14] Populaton Sze 100 ndvduals Recombnaton Dfferental Evoluton, CR =0.1, F =0.5 value k/100 for α. Informally, the 50%-attanment surface n the mult-obectve doman s analogous to the medan n the sngle-obectve one. In a smlar way, the 25%- and 75%-attanment surfaces can be used as the frst and thrd quartle fronts and the regon between them could be consdered a knd of nterquartle regon. When the number of obectves s one, the 50%-attanment surface s the medan and the nterquartle regon s the nterquartle range. EAs are stochastc algorthms; therefore the results have to be provded wth statstcal sgnfcance. The followng statstcal procedure has been used. Frst, 100 ndependent runs for each algorthm and each problem nstance have been performed. The HV ndcator and the attanment surfaces are then computed. In the case of HV, the Kruskal-Walls test has been carred out to check f the dfferences n the algorthms are statstcally sgnfcant [17]. Snce more than two algorthms are nvolved n the study, a post-hoc testng phase whch allows for a multple comparson of samples has been performed. All the statstcal tests are performed wth a confdence level of 95%. 5.2 Parameterzaton We have chosen a set of parameter values that guarantees a far comparson among all the algorthms. All the GAs (NSGA-II,, and ), as well as, use an nternal populaton sze equal to 100; the sze of the archve s also 100 n,, and. The stoppng condton s always to compute 100,000 functon evaluatons for all of the algorthms. Regardng the soluton representaton, the algorthms adopt a floatng pont encodng n whch gene represents the dedcaton of employee /T to task MODT,whereT s the number of tasks of the addressed nstance. Wth ths encodng, the typcal operators from the mult-obectve EAs feld have been used. A detaled descrpton of the parameter values adopted for our experments s provded n Table 1. 5.3 SPS Instances In order to perform a meanngful study we must analyze a number of nstances of the schedulng problem nstead of 1918

focusng on only one, whch otherwse could bas the conclusons. We have used a total of 36 randomly generated nstances that have been prevously addressed n [1]. The nstance set can be dvded nto two groups. In the frst group we fnd 18 nstances, each one wth a dfferent software proect. The number of employees can be 5, 10, or 15 and the number of tasks 10, 20, or 30. The total number of sklls S can be ether 5 or 10 and the number of sklls per employee ranges from 2 to 3. We denote the nstances of ths group wth T -EgS. For example, the nstance 10-5g5 has 10 tasks, 5 employees and 5 sklls. The second group of nstances s composed of 18 nstances n whch the number of sklls S s 10. As n the prevous group, the number of employees can be 5, 10, or 15 and the number of tasks takes values 10, 20, and 30. However, n ths second group two ranges of values are consdered separately for the number of sklls of the employees: from 4 to 5, and from 6 to 7. The nstances n ths group are denoted wth the name T -EpM, where T and E s the number of tasks and employees, respectvely and M s the maxmum value n the range for the number of sklls per employee. For example, the nstance 30-15p7 has 30 tasks, 15 employees and the number of sklls per employee vares from 6 to 7. In the 36 nstances the maxmum dedcaton for all the employees s 1, whch means that all the employees can have complete dedcaton to the proect. All these nstances are publcly avalable at the http://mstar.lcc.uma.es. 5.4 Comparson of Algorthms The frst set of experments s devoted to evaluatng the fve mult-obectve metaheurstcs on the set of 36 SPS nstances. We have performed a pure mult-obectve comparson among the algorthms based on the HV values. Because of room constrants, we ust hghlght the most nterestng fndngs. The frst concluson that can be drawn from ths HV results s that the approxmated Pareto fronts reached by have the hgher (better) HV values n most of the nstances (25 out of 36)., and to a lesser extent, have also addressed the SPS nstances properly, snce t has performed the best n nne and two nstances, respectvely, and rankng the second and thrd for many of the remanng ones. It s worth mentonng that NSGA-II and, the most wdely used solvers n the lterature, have never ranked the frst. Takng a look to the average rank, they respectvely have scored 3.61 and 4.83 (recall that 1 s the best poston and 5 the worst). An n-depth analyss also reveals that the more complex the nstance (more tasks and more employees), the better s,.e., the larger the dfference n the HV values wth respect to those of the other four algorthms. However, t also occurs that the current settngs of lead the algorthm to be outperformed by NSGA-II,, and n the smaller SPS nstances (.e., those wth 10 and 20 tasks and 5 employees) and wth a small number of sklls. The nstances matchng these requrements are: 10-5g5, 10-5g10, 20-5g5, 10-5p5, 20-5p5, 10-5p7, and 20-5p7. usually assgns a low dedcaton to the employees for each task n the schedulng, what s benefcal for larger nstances wth a hgher number of tasks and employees, snce t avods constrants volaton. However, for nstances n whch the dedcaton s expected to be hgh because few employees wth few sklls exst, NSGA-II, or have shown to better explore these regons of the search space. The HV values have also shown that the search capabltes of dmnsh as the number of tasks gets ncreased, that s, when the nstances become larger. A careful trackng of the algorthm (not ncluded due to space constrants) reveals that, for these nstances, the dfferental crossover operator hardly computes feasble solutons. Therefore, snce the HV ndcator s computed only over feasble solutons, the resultng fronts are scarcely populated and lead to small HV values. Let us now turn to analyze the attanment surfaces of the dfferent algorthms used n the experments for some nstances of the problem. For the sake of clarty, we only show the 50%-attanment surfaces related to the results of the algorthms. We have selected the most representatve ones, that s, those wth the most nterestng features. In addton to the attanment surfaces, n Fgures 2 to 6, we also show the best known approxmated Pareto front (Reference Pareto Front, ). We observed n the prevous analyss that, n general, has performed the best wth respect to the HV ndcator. If one takes a look to the 50%-attanment functons of the algorthms for the 30-15g5 nstance (see Fgure 2), t can be easly ustfed the hgh HV value obtaned by. Indeed, ts approxmated Pareto fronts are not only close to the Pareto Front but also they cover a larger regon n the obectve space. In ths case, t s clear that the solutons proposed by are better than the ones proposed by the other algorthms. Ths fact s also observed n the nstances 30-10g5, 30-10g10, 30-15g10, 30-10p5, 30-15p5 and 30-15p7; all of the nstances wth 30 tasks and 10 or 15 employees: the most complex ones. We conclude that has the desrable property of scalablty. In the nstance 10-5g5, the 50%-attanment surfaces of all the algorthms are very close to each other (see Fgure 3). Ths fact can also be observed n nstances 20-5p5, 30-5p5, 20-10p7. In all the prevous nstances, s clearly the algorthm wth the worst attanment surface. Another nterestng related fact appears n the 20-15g10 nstance (see Fgure 4), n whch the attanment surfaces of MO- Cell, and NSGA-II cross that of. Ths can be also seen n 20-15p5 and 30-10p7. The pont s that, accordng to the HV ndcator, has reached the best (hghest) HV value n these three nstances and, accordng to the attanment surfaces, the reason s the large extenson of the attanment lne. However, the other algorthms propose solutons that domnate part of the attanment surface of. Some proect managers would prefer the Duraton 0 20 40 60 80 100 120 30 15g5 2550000 2600000 2650000 2700000 2750000 2800000 Fgure 2: 50%-attanment surfaces of the algorthms n the 30-15g5 nstance. 1919

10 5g5 30 5g5 Duraton 0 50 100 150 200 250 Duraton 0 200 400 600 800 1200 680000 700000 720000 740000 760000 780000 800000 820000 2550000 2600000 2650000 2700000 2750000 2800000 2850000 Fgure 3: 50%-attanment surfaces of the algorthms n the 10-5g5 nstance. 20 15g10 Fgure 5: 50%-attanment surfaces of the algorthms n the 30-5g5 nstance. 10 5p7 Duraton 0 20 40 60 80 100 120 Duraton 0 100 200 300 400 1750000 1800000 1850000 1900000 700000 750000 800000 850000 Fgure 4: 50%-attanment surfaces of the algorthms n the 20-15g10 nstance. solutons because the range of values for the obectves s wder, but some others would prefer the solutons from MO- Cell or because they have both lower cost and lower duraton n ther nterest regon. The prevous example llustrates that a scalar ndcator lke HV s not always the most sutable ndcator to make a decson, snce scalar values hde a lot of nformaton that could help the proect managers n ther decsons. In the 30-5g5 nstance,, and even NSGA- II are better than n a specfc regon of the obectve space (see Fgure 5). If ths regon s nterestng for the proect manager, s not a good algorthm for her/hs purposes, even although the hypervolume of s the hghest one wth statstcal sgnfcance. The scenaro shown n Fgure 5 s the most frequent one n the nstances solved: 18 out of the 36 nstances share ths feature. From a vsual nspecton of the 50%-attanment surfaces we conclude that and domnate the soluton of n a specfc regon, followed by NSGA-II. Fnally, the last observed sort of scenaro can be found n the 10-5p7 nstance (Fgure 6). In ths case, the 50%- attanment surface of s clearly domnated by the ones of, and NSGA-II. Other nstances wth the same behavor are 10-5p5, 10-10p7, and 10-15p7, all of them wth 10 tasks. It can be concluded that n the case of smple nstances,, and to a lesser extent NSGA-II, are the best algorthms for solvng the multobectve problem. Fgure 6: 50%-attanment surfaces of the algorthms n the 10-5p7 nstance. 5.5 Analyss of the Problem Solutons In ths secton we focus on the solutons obtaned usng the mult-obectve algorthms. We want to analyze the features of these solutons, showng correlatons between ther features and the regon n the obectve space they can be found. In partcular, we are nterested n analyzng the partcpaton of each employee n the proect, e par, and the amount of human resources spent on each task, t ahr. We want to analyze how these values change as the solutons move n the obectve space. For each proposed soluton by one algorthm for one sngle nstance, the E + T values have to be analyzed. Ths means a large amount of data to process and show. In order to reduce ths amount of data wthout losng nterestng nformaton, the followng analyss has been performed. Frst, we focus on the results of, snce t s the algorthm coverng, n general, the wder regon n the obectve space. For each nstance, all the solutons of the approxmated Pareto front obtaned n the dfferent ndependent runs of the algorthm are consdered. The e par and t ahr values are then computed for each employee and each task n all the prevous solutons. The Spearman rank correlaton coeffcents [17] between all the e par, t ahr, p dur, and p cost are then calculated. In Fgure 7 we how the correlaton coeffcents for the 20-15g5 nstance. An arrow up means postve correlaton and an arrow down means negatve correlaton. The absolute value of the correlaton s shown n gray scale (the darker the hgher). 1920

t17 t20 t19 t18 t17 t16 t15 t14 t13 t12 t11 t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 e15 e14 e13 e12 e11 e10 e9 e8 e7 e6 e5 e4 e3 e2 e1 dur cost dur e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 e13 e14 e15 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18 t19 Fgure 7: Spearman rank correlaton coeffcents between p cost, p dur, e par and t ahr n the 20-15g5 nstance for the solutons obtaned wth. t2 t1 t3 t18 The frst observaton we can hghlght s the clear nverse correlaton between proect cost and duraton. Ths s an expected result and t gves no relevant nformaton snce we are analyzng solutons belongng to sets of non-domnated solutons where an ncrease n cost mples a decrease n duraton. If we focus on the proect cost and the partcpaton of the employees we observe that for employees e 2, e 3, e 5, e 6, e 11, e 12, e 13, e 14, ande 15 the correlaton s postve whle for e 7, e 8, e 9 and e 10 s negatve. What does ths mean? When the cost of the solutons proposed by the algorthm ncreases, the partcpaton of former set also ncreases; that s, these employees spend more and more tme n the tasks of the proect as we move n the obectve space to solutons wth hgher cost and lower duraton (observe the negatve correlaton of these employees wth proect duraton). On the contrary, the partcpaton of e 7, e 8, e 9 and e 10 s reduced. Ths does not mean that they spend less tme n the proect, snce we defned partcpaton as a normalzed measure; ths means that the fracton of workload that s performed s reduced because the other employees ncrease ther partcpaton. But the nterestng queston s, why does ths happen? The answer s that e 7, e 8, e 9 and e 10 are the cheapest employees,.e., ther salary s actually lower. The algorthm then looks for schedulngs that assgn most of the work of the proect to them because they earn less money. However, when the proect duraton s reduced (and the cost ncreased) the other employees have to ncrease ther partcpaton. We can also observe a negatve correlaton between e 7, e 8, e 9 and e 10 and the rest of the employees, as expected from the prevous dscusson. Let us now turn to the tasks. We observe postve correlatons between two ntal tasks: t 1 and t 8. If we take a look to the TPG of the nstance (Fgure 8) we notce that these tasks have no precedence constrants. Indeed, t makes no sense to ncrease the work n one of them and not n the other, snce the saved tme does not allow a reducton n the total proect duraton and the proect cost could ncrease. However, t 2 s also an ntal task, and a negatve correlat6 t4 t5 t9 t10 t7 t8 Fgure 8: TPG of the 20-15g5 nstance. ton between t 1 and t 2 s dentfed. In ths nstance, task t 2 does not requre too much effort and can be easly done: t does not belong to the crtcal path. In ths stuaton the employees are shared between t 2 and t 1, whch are parallel tasks. The most nterestng observaton n the solutons proposed for ths nstance s perhaps the postve correlaton between tasks t 14, t 16, t 20 and the proect duraton. Ths means that the solutons proposed by the algorthm are shorter when less people work on these tasks. Ths automatc fndng by the algorthm seems counterntutve and we had to carefully analyze the solutons to understand why ths happens. At the end, the concluson s that the solutons proposed by n the short-duraton regon of the obectve space are not optmal. In a better soluton, more human resources would be assgned to tasks t 14, t 16 and t 20, thus showng postve correlaton wth proect duraton. In fact, the 20-15g5 nstance belongs to the class of nstances nwhchsoutperformednsomeregons;npartcular, the short-duraton regon s outperformed by NSGA-II. After dvng nto the correlatons computed for the approxmated Pareto fronts proposed by NSGA-II, we conclude that the postve correlatons between the proect duraton and the mentoned tasks do not appear, therefore supportng the prevous explanaton. In summary, the correlaton coeffcents help us to analyze the knd of solutons proposed by the algorthms. We conclude that the low cost solutons are obtaned by assgnng most of the proect workload to the employee wth lowest salary. As the proect duraton s decreased, however, the other employees are assgned more and more workload. Concernng the tasks, we observe that the amount of human resources for those tasks that must be performed n parallel s ncreased at the same tme when all the tasks are requred to fnsh at the same tme. If ths requrement s not necessary, the parallel tasks compete and the amount of resources s shared between the tasks. When two tasks are consecutve n the TPG, the human resources can be dstrbuted between the two tasks wth a small nfluence n cost or tme, leadng to dfferent ways of managng the proect. 6. CONCLUSION AND FUTURE WORK Ths paper have approached the SPS problem wth ts natural mult-obectve formulaton n whch both the proect cost and ts duraton have to be mnmzed. Fve multobectve metaheurstcs have been used, namely NSGA-II,,,, and. They all have been evaluated over 36 SPS nstances that cover dfferent scenaros that mght appear n software proects. The results of the expermental study allowed us to get valuable conclusons, whch are, however, lmted by the fact that we only used a fnte set of nstances. t13 t12 t14 t11 t19 t15 t16 t20 1921

The results have been analyzed n two complementary ways. Frst, the HV ndcator and the attanment surfaces have been used to evaluate the qualty of the fve MO algorthms. Ths has shown that has reached the best approxmated Pareto fronts. has also performed well on the hypervolume ndcator. The two most wdely used algorthms n the lterature, NSGA-II and, have reported the lowest (worst) HV values. The attanment surfaces have allowed us to dstngush the regon of the obectve space that s better explored by the MO algorthms. has shown to outperform the other algorthms on regons of the obectve space wth low proect cost and long duraton, whereas NSGA-II,,, and have reached enhanced proect schedulngs wth hgh proect cost but short duraton. Second, an analyss of the resultng proect schedulng of the algorthms has been provded. It has been based on computng the Spearman rank correlaton coeffcents between proect cost, proect duraton, tasks, and dedcaton of employees. Ths analyss has not only shown the common sense nverse correlaton between the proect cost and ts duraton, but also many other fully nformatve, postve/negatve correlatons that emerge from the features of the gven nstances. As future work we plan to advance n dfferent lnes that are summarzed n the followng. Frst, the formulaton of the problem could be changed n order to nclude more realstc stuatons. Second, one mportant aspect n proect schedulng n general, and n software proects n partcular, s the robustness of the soluton. Proect managers prefer not only good schedulngs but also schedulngs that can accommodate small changes n the parameters of the problem wthout a large varaton n ther cost or makespan. These changes n the parameters of the problem can be a varaton n the staff or n the task effort. In the case of software proects t s qute usual that the effort of the tasks s not well estmated, thus a robust soluton would be valuable for a proect manager. Thrd, accordng to the results of the mult-obectve algorthms we conclude that some of the algorthms are good for some nstances or n some regons of the obectve space whle others are better n other stuatons. Perhaps the best algorthm for the software proect schedulng problem would be one algorthm that combnes some of the features of the best performng algorthms. Thus we plan to desgn hybrd algorthms combnng these features. Acknowledgements Ths work has been partally funded by the Spansh Mnstry of Scence and Innovaton and FEDER under contract TIN2008-06491-C04-01 (the M proect). It has also been partally funded by the Andalusan Government under contract P07-TIC-03044 (DIRICOM proect). 7. REFERENCES [1] E. Alba and F. Chcano. Software proect management wth GAs. Informaton Scences, 177(11):2380 2401, June 2007. [2] G. Antonol, M. D Penta, and M. Harman. A robust search-based approach to proect management n the presence of abandonment, rework, error and uncertanty. In 10th Int. Symp. on the Software Metrcs (METRICS 04), pages 172 183, 2004. [3] T. Bäck, D. B. Fogel, and Z. Mchalewcz, edtors. Handbook of Evolutonary Computaton. Oxford Unversty Press, 1997. [4] C. A. Coello Coello, G. B. Lamont, and D. A. Van Veldhuzen. Evolutonary Algorthms for Solvng Mult-Obectve Problems. Sprnger, 2nd edton, 2007. [5] K. Deb. Mult-obectve optmzaton usng evolutonary algorthms. John Wley & Sons, 2001. [6] K. Deb, A. Pratap, S. Agarwal, and T. Meyarvan. A fast and eltst multobectve genetc algorthm: NSGA-II. IEEE Transactons on Evolutonary Computaton, 6(2):182 197, 2002. [7] J. Duggan, J. Byrne, and G. Lyons. A task allocaton optmzer for software constructon. IEEE software, Jan 2004. [8] J. Durllo, A. Nebro, and E. Alba. The metal framework for mult-obectve optmzaton: Desgn and archtecture. In IEEE Congress on Evolutonary Computaton, CEC 2010, pages 4138 4325, 2010. [9] S. Gueorguev, M. Harman, and G. Antonol. Software proect plannng for robustness and completon tme n the presence of uncertanty usng mult obectve search based software engneerng. In GECCO 2009, pages 1673 1680, 2009. [10] T. Hanne and S. Nckel. A multobectve evolutonary algorthm for schedulng and nspecton plannng n software development proects. European Journal of Operatonal Research, Jan 2005. [11] J. Knowles. A summary-attanment-surface plottng method for vsualzng the performance of stochastc multobectve optmzers. In ISDA 05, pages 552 557, 2005. [12] J. Knowles and D. Corne. The pareto archved evoluton strategy: A new baselne algorthm for multobectve optmzaton. In CEC 99, pages 98 105. [13] J. Knowles and D. Corne. Approxmatng the nondomnated front usng the pareto archved evoluton strategy. Evolutonary Computaton, 8(2):149 172, 2000. [14] S. Kukkonen and J. Lampnen. : The thrd evoluton step of generalzed dfferental evoluton. In IEEE Congress on Evolutonary Computaton (CEC 2005), pages 443 450, 2005. [15] A. J. Nebro, J. J. Durllo, F. Luna, B. Dorronsoro, and E. Alba. A cellular genetc algorthm for multobectve optmzaton. In NICSO 2006, pages 25 36, 2006. [16] A. J. Nebro, J. J. Durllo, F. Luna, B. Dorronsoro, and E. Alba. Desgn ssues n a multobectve cellular genetc algorthm. In EMO 2007, LNCS 4403, pages 126 140, 2007. [17] D. J. Sheskn. Handbook of Parametrc and Nonparametrc Statstcal Procedures. Chapman & Hall/CRC; 4th edton, 2007. [18] E. Ztzler, M. Laumanns, and L. Thele. : Improvng the strength Pareto evolutonary algorthms. In EUROGEN 2001, pages 95 100, 2002. [19] E. Ztzler and L. Thele. Multobectve evolutonary algorthms: a comparatve case study and the strength pareto approach. IEEE Transactons on Evolutonary Computaton, 3(4):257 271, 1999. 1922