Phase Distribution of Software Development Effort

Distribution of Software Development Effort Ye Yang 1, Mei He 1,2, Mingshu Li 1, Q ing Wang 1, Barry Boehm 3 1 Institute of Software, Chinese Academy of Sciences, China. 2 Graduate University of Chinese Academy of Sciences, China. 3 University of Southern California, USA. { ye, hemei, mingshu, wq}@itechs.iscas.ac.cn, boehm@csse.usc.edu ABSTRACT Effort distribution by phase or activity is an important but often overlooked aspect compared to other steps in the cost estimation process. Poor effort allocation is among the major root causes of rework due to insufficiently resourced early activities. This paper provides results of an empirical study on phase effort distribution data of 75 industry projects, from the China Software Benchmarking Standard Group (CSBSG) database. The phase effort distribution patterns and variation sources are presented, and analysis results show some consistency in effects of software size and team size on code and test phase distribution variations, and some considerable deviations in requirements, design, and transition phases, compared with recommendations in the COCOMO model. Finally, this paper discusses the major findings and threats to validity and presents general guidelines in directing effort allocation. Empirical findings from this study are beneficial for stimulating discussions and debates to improve cost estimation and benchmarking practices. Categories and Subject Descriptors D.2.8 [Metrics]: process metrics and product metrics. D.2.9 [Software Engineering]: Management Cost estimation General Terms Management, Measurement. Keywords Cost Estimation, Effort Distribution, Distribution, Development Type, Estimation Accuracy, Effort Allocation. 1. INTRODUCTION Cost estimation methods/techniques have been intensively studied and evolved over the past few decades. Researchers employ different methods, such as parametric modeling, knowledge-based modeling, fuzzy logic, dynamic modeling, neural networks, or casebased reasoning [1, 2], to increase general estimation accuracy and improve estimation capability with respect to emergent development paradigms, such as COTS or open source-based development [3, 4]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ESEM 08, October 9-10, 2008, Kaiserslautern, Germany. Copyright 2008 ACM 978-1-59593-971-5/08/10...$5.00. However, despite the increasing trend of amount and varieties of estimation methods, it remains a great challenge for software projects to be successfully completed according to the 29% success rate of year 2004 reported by Standish group [5]. In examining the assumptions and construction of typical cost estimation methods [6-11, 12], two limitations are often the major impediment for software project to meet the success criteria, i.e. on time, within budget, and with originally-specified functionalities. On one hand, limited progress has been made to fully understand characteristics of emergent development paradigm, as well as its implications to cost estimation. This consequently adds lots of complication factors in making process-oriented decisions in early project phases. On the other hand, even with the classic waterfall development process, there is very limited research effort towards analyzing and understanding root causes of variations in effort distribution patterns. Effort distribution by phase or activity is an important but often overlooked aspect compared to other steps in the cost estimation process. Poor effort allocation is among the major root causes of rework due to insufficiently resourced early activities. Distribution of effort in software engineering process has been the basis for facilitating more reasonable software project planning, and is provided as one of the major functionalities in most estimating or planning methods. A general form of such basis is a breakdown structure of the total estimate over different life cycle phases or activities, with corresponding distribution percentage numbers for each phase/activity. As the rapidly increasing number of estimation methods and tools becoming available, practitioners are often left with even greater difficulty and confusion in making cost estimation and distribution given large variation in the lifecycle phases/activities covered in each method. Many software practitioners are facing insufficient planning support when handling a particular project conditions, and solely relying on Rule of Thumb or expert judgment to handle project phase effort planning as this is only feasible approach available to them [13]. Compared with a total estimate, it is more important to develop an accurate phase-based estimation for software development process in order to facilitate appropriate and strategic resource planning. The paper presents empirical analysis results on phased effort data of 75 industry projects, from the China Software Benchmarking Standard Group (CSBSG) [26] database. The analysis investigates the general effort distribution profiles of different development type, software size, business area, and team size. The study also concludes guidelines on how to take the most significantly influential factors into consideration when making effort distribution decisions. Such results are beneficial to guide software engineers in 61

the effort estimations process and help project managers in better allocating of resources which could meet specific project needs. The reminder of the paper is structured as follows: section 2 introduces an overview of related work; section 3 describes the objective, subject, and approach of the study; section 4 presents major results of phase effort distribution profiles on the 75 industry projects; section 5 discusses the results, and section 6 concluded the paper with directions for future work. 2. OVERVIEW OF RELATED WORK Empirical assessment of industrial software development project data, as an integral part of many effort distribution studies, has led to the production of several well-known and widely adopted bases to guide software estimation practices. In his seminal work to cost estimation field, Norden [14] observed the Rayleigh Curve to be a good staffing level approximation to most hardware development projects. Though it has been criticized for its inapplicability in software development projects due to the slow early build-up and the long-tail effects, Rayleigh distribution has influenced on the effort distribution mechanisms in many later models, such as the COCOMO model [6, 11] and the SLIM model [15]. Based on further empirical studies on effort distribution data, Boehm presented the waterfall phase/activity breakdown quantities in the COCOMO model [6, 11], and Krutchen concluded the lifecycle effort/schedule distribution structure used by the Rational Unified Process (RUP) [12]. The COCOMO 81 model s [11] way of handling phase distribution is to provide a distribution scheme consisting of ratios for every development phase (i.e. product design, code, integration and test, for the waterfall processes), approximating the Rayleigh curve. Furthermore, it defines three development modes, five levels of project scale, phase-sensitive effort multipliers to help derive more accurate allocation decision with respect to specific project settings. For example, COCOMO model suggests that more development focus should be placed on integration and test phase as the software size grows (i.e. 16% for small projects and 25% for large projects) [6], and less focus to be given to the code phase as the software size increases (i.e. 68% for small projects and 59% for large projects). Meanwhile, the phase-sensitive effort multipliers enable the estimation to be performed on a more accurate, phase-wise basis, by tracking down the influences of different effort multiplier ratings on phase distribution variation. This offers an alternative to adapt the basic distribution scheme to a better-fitted one according to a particular project situation. However, the application of the detailed COCOMO model is a rather complex procedure to produce the forms and tables following strict steps, which requires intensive project knowledge upfront. The COCOMO II [6] model unifies the phase-sensitive effort multipliers to be the same for all development phases for several considerations. Simplification of model usage and lack of detailed COCOMO calibration data are the two major reasons. The COCOMO II extends the COCOMO 81 model to include the plans and requirements and transition phases into its waterfall lifecycle effort distribution scheme, and a MBASE/RUP phase/activity distribution scheme adopted from [12]. One drawback of this simplification is the lack of flexibility in providing more relevant effort distribution guidance to meet specific project needs compared to the original COCOMO 81 model. One major cause of the lack of phase distribution data is because of large variation in the lifecycle phases/activities defined in many leading estimation methods. This complicates the lifecycle concept synthesizing, model usage, data collection and analysis, as well as cost model calibration. Table 1 summarizes the comparison among phase coverage of several leading estimation models. Table 1. Comparison of phase definitions in different model Model The phases (effort distribution percentages) covered in different models COCOMO 81 [11] COCOMO II [6] RUP [12] COCOTS [6] SLIM [15] SEER-SEM [17] Plan and requirement, preliminary design, detailed design (25%), code (33%), Integration & Test (25) Waterfall distribution scheme: Plan and requirement (7%), preliminary design (17%), detailed design (25%), code (33%), Integration & Test (25), deployment & maintenance (12%); MBASE/RUP distribution scheme: Inception(6%), Elaboration(24%), Construction(76%), Transition (12%) Inception(5%), Elaboration(), Construction(65%), Transition() Assessment, Tailoring, Gluecode&Integration. No unified distribution guideline Concept Definition, Requirement & Design, Construct & Test, Perfective Maintenance (distribution percentage not available) Early specification, design, development, delivery & maintenance (distribution percentage not available) One the other hand, while traditional methods are more focused on the improvement of general estimation accuracy [6, 18], a number of recent studies start to investigate different aspects of effort distribution patterns. Heijstek and Chaudron [10] reported empirical data on effort distribution from projects employed model-based development practices, and confirmed the similarity between the reported effort distribution and the original RUP hump-chart [12]. Milicic and Wholin [13] studied characteristics affecting estimating accuracy and proposed a lean approach to improve estimation based on distribution patterns of estimation errors. Yiftachel et. al. [18] developed an economic model for optimal allocation of resources among development phases. Other researchers also studied on determining optimal effort distribution across lifecycle phases based on defect-introduction and defect-slippage considerations [19, 20]. However, few of these studies have performed a comprehensive analysis of phase effort distribution patterns and its possible causes. Consequently, notable inconsistency exists in results from different researchers regarding phase distribution patterns even for the same development life cycle, RUP, as concluded in [10]. Meanwhile, communications and discussions with estimation experts and practitioners at the COCOMO community indicates an increasing interests in the needs for more phase-sensitive distribution facility than what are available now from COCOMO II default schemes. All this confirms our motivations of this study. 62

3. OBJECTIVE, SUBJECT AND APPROACH OF THE STUDY 3.1 Objective of the study The objective of this study is to perform extensive data analysis towards developing a more in-depth understanding on what factors impact on the degree of intensity of different development phases, and eventually facilitating strategic planning through more reasonable phase effort prediction. Concluded from previous studies on ISBSG[21, 22], we focused on four project factors including development life cycle model, development type, size of the software, and team size, trying to derive preliminary answers to our objective. Hence, the central research questions are formulated as: 1) What does the overall phase distribution profile of development effort look like? 2) Is there a causal relation between difference development life cycles and phase effort distribution? 3) Is there a causal relation between phase distribution pattern and development type, e.g. new development or enhancement? 4) Is there a causal relation between phase distribution pattern and actual software size? 5) Is there a causal relation between phase distribution pattern and maximum development team size? 3.2 Subject of the study Empirical data used in this study is from the China Software Benchmarking Standard Group (CSBSG) [16]. The CSBSG was established in Jan. 2006 under the support of Chinese government and software industry association, to advocate and establish domestic benchmarking standards for system and software process improvement in Chinese software industry. The establishment of CSBSG database is consistent with the ISBSG database [21], and each data point is defined as over 1000 product and process metrics. The CSBSG dataset we used contains 1012 software development project data points from 141 organization and 15 geographical regions. Development phases defined in CSBSG database roughly follow the waterfall model, with a comprehensive list of guidelines for individual data submittal organization to match their own process and transform data appropriately. The following table lists the CSBSG development phases and major activities included in each phase. Table 2 Activities defined in the CSBSG phases Activities Included Plan Plan, Preliminary Requirement Analysis Requirement Requirement Analysis Design Product Design, Detailed Design Code Code, Unit Test, Integration Test System Test Transition Installation, Transition, Acceptance Test, User Training, Support 3.3 Data Collection and Cleaning Data collection. Firstly, the CSBSG staff sent questionnaire to organizations. The electronic questionnaire has the capability to check some error automatically, such as spelling error and data inconsistency. Then, after the organization sent back the data to CSBSG, a special expert group checks the data quality again and confirms it is eligible to put into the CSBSG database. During this procedure, there is some specialized focal contact personnel to connect with the organization and the expert group, and the information of the organization is hidden in the data submitted to the expert group; the contact person also reacts when some data problem is founded. Data Cleaning. Due to the tight schedule of the data collection process, there are many phase-level attributes unreported, such as predicted and actual effort of each development phase. Therefore, for the purpose of our study, we only make use of a subset of the CSBSG database with data on our focused attributes. The attributes and data cleaning steps are summarized in Table 3. Next we will briefly introduce the data cleaning steps. Table 3a Summary of the minimum set of attributes considered in the study Metric Unit Description Size SLOC Total Lines of Code Effort Person- Summary Work Effort Plan phase effort Person- Work Effort of plan phase Reqt s phase effort Person- Work Effort of reqt s phase Design phase effort Person- Work Effort of design phase Code phase effort Person- Work Effort of code phase Test phase effort Person- Work Effort of test phase Transition phase Person- Work Effort of transition phase effort Development life Nominal Waterfall, iterative, rapid prototyping cycle Team Size Person Maximum size of the development team Development Type Nominal New development, Enhancement, Redevelopment Table 3b The process of data selection Step # of proj. # of proj. Reason for exclusion ID excluded remained 1 115 Do not contain phased-related effort records 2 14 101 there are two or more phases effort data missing 3 1 100 the sum of its 5 recorded phased effort is greater than the total effort 4 if only 1 of the 5 phase effort value is missing, it is filled up by subtracting the other 4 phases from the total effort 5 25 75 one or more than one phase s effort data recorded as zero, and they are excluded and considered to not cover the whole life-cycle phases 63

At first, 115 data points containing phased-related effort records were extracted out of the whole CSBSG dataset. Among these, 14 data points were excluded because there are two or more phases effort data missing. Furthermore, 1 project is identified to be bad data and excluded, because the sum of its 5 recorded phased effort is greater than the total effort. Additionally, there are 25 projects with one or more than one phase s effort data recorded as zero, and they are excluded sine the records are considered to not cover the whole life-cycle phases. Finally, 75 projects with complete phased effort values are included in this research. 4. RESULTS The 75 projects came from 46 software organizations distributed in 12 regions across China, and the largest number of projects from the same organization is 8. Table 4a at below summarizes the project data on size and effort of our dataset. Note the significant difference between the mean and median of the size metric is mainly due to 1 very large project whose size is greater than 1000KSLOC. Table 4a Summary of the project data Mean Median Min Max Size (KSLOC) 136.4 45.7 0.77 2340 Effort (Person-s) 8969 4076 568 134840 To illustrate the stability of the variation associated with the dataset, scatter-plot of the log-transformed size and effort is given in Figure 2. It indicates a significant linear relationship between them (significance level is based on p-value <0.01), and the R-square of the linear regression line in the chart is 0.5231. Ln(Effort) 14 12 10 8 6 4 2 R 2 = 0.5231 0-2 0 2 4 6 8 10 Ln(Size) Figure 2. Scatterplot of total development effort against software size after log-transformation. 4.1 Overall Distribution Pattern Table 4b summarizes the median, mean, and standard deviation of effort distribution percentage for each of the five CSBSG development phases after the combining of the original plans and requirements phases. Table 4b Overall phase distribution profile Plan& Min 1.82% 0.62% 6.99% 4.24% 0.06% Max 35% 50.35% 92.84% 50.54% 36.45% Median 15.94% 14.21% 36.36% 19.88% 4.51% Mean 16.14% 14.88% 40.36% 21.57% 7.06% Stdev 8.62% 8.91% 16.82% 11.04% 7.06% Though it is generally believed that major variations in both waterfall and RUP phase distribution quantities come in the phases outside the core development phases [6], the CSBSG data shows that code and test are the two development phases with greater variation than other phases. For example, the project with the least design phase effort distribution quantity, i.e. 0.62%, is also the same project with the greatest percentage numbers in code phase, i.e. 92.84%. To examine the differences of individual phase distribution between CSBSG dataset and COCOMO II recommendations, we compare the mean distribution profile of CSBSG dataset with the COCOMO II waterfall distribution quantities. Effort % 45% 4 35% 3 25% 15% 5% Plan& COCOMO II CSBSG Figure 3. Comparison of overall waterfall distribution quantities between CSBSG and COCOMO II datasets As shown in Figure 3, distribution similarities appear only in the Test and Transition phases. Some significant differences between the two datasets include that: 1) a greater emphasis on Plans and Requirements phase in CSBSG dataset: 16.14% of overall development effort, compared to 6% in COCOMO II; 2) the design phase is dramatically less resource-allocated in CSBSG projects (i.e. 14.88%) than COCOMO II projects (35% in average); and 3) the average distribution percentage for Code phase (40.36%) in CSBSG projects is much higher than that (27.8% in average) in COCOMO II projects. One possible explanation is that the great distribution percentage for code phase in CSBSG database may have resulted from some projects insufficiently resourced design phase, which might cause frequent rework in code phase. 64

4.2 Development life cycle Excluding 1 project with bad data on development life cycle with vague description, our dataset consists of projects employed three different development life cycles: waterfall (57 projects), iterative (15 projects), and rapid prototyping (2 projects). Due to the small sample size of rapid prototyping projects, we exclude it from further analysis. Figure 4 illustrates the comparison of phase distribution profiles of waterfall and iterative models using median value. A further examination is also performed to check the average software size of projects employing each of these three development life cycles. The results show that the average software size for these three groups is: 103KSLOC, 259KSLOC, and 85KSLOC for waterfall, iterative, and rapid prototyping respectively. It indicates that iterative is the mostly deployed life cycle model for ultra large projects. Effort% 5 45% 4 35% 3 25% 15% 5% Plan& Waterfall Iterative Figure 4 Comparison among different process models Table 5 Summary of phase distribution differences from Mean Effort% for whole dataset by development type LifeCycle Model N Plan & Waterfall 57 0.0 1.85% -0.88% 0.17% -0.99% Iterative 15-2.38% -1.45% 7.42% -1.6 2.67% The phase distribution deviation from the total average is listed in Table 5. Iterative processes, most suitable for complex large scale projects in CSBSG dataset, employ less effort percentage in plan and requirement, design, and test phases. It is actually due to iterative testing and integration effect on the basis of multiple iterations. This can also explain the highest distribution percentage of iterative development in code phase, because the iterative integration effort is included into code phase, according to CSBSG phase definition discussed in section 3.2. However, since the concept of phase in iterative or rapid prototyping development is not as clear as that in waterfall processes, data collection process is likely to involve greater participant bias in matching and transforming data from their own process breakdown structure to CSBSG process structure. 4.3 Development Type Development type is another factor that might affect phased effort distributions. Three software development types are pre-defined by the CSBSG database, namely new development (New), redevelopment (ReDev), and enhancement (Enhance), according to the following definitions: New development: where a software product was developed and introduced for the first time to satisfy specific customer needs; Re-development: where new technologies was incorporated in replacing or upgrading an existing software product in use; Enhancement: where new functionality requirements were designed and implemented to modify or extend an existing software system. Effort% 5 45% 4 35% 3 25% 15% 5% Plan& Re-Dev New Dev Enhance Figure 5 Comparison among difference development types Each of the 75 data points in our study contain a development category identified by the project team based on the above definition. The median effort ratios of each phase for three development type projects are illustrated in Figure 5, and the phase distribution deviation from the total average is listed in Table 6. The numbers of projects for the three groups are: 6 (ReDev), 53 (New), and 16 (Enhance). Table 6 Summary of phase distribution differences from Mean Effort% for whole dataset by development type Dev type N Plan & ReDev 6-3.64% -3.96% 8.95% -4.96% 3.59% New Dev 53-0.44% -0.12% 2. -1.07% -0.58% Enhance 16 2.82% 1.88% -10.67% 5.4 0.56% New development and enhancement projects seem to have similar distributions in most of the development phases, except that new development projects spend more effort (roughly 13% more) in Code phase, while enhancement projects have greater focus on Test phase (about 6.5% more). This is because enhancement development involves through system testing even though the modifications happen on a smaller code base. At the meantime, redevelopment type has the greatest distribution emphasis on Code phase. A possible reason is that re-development projects involve the migrating of existing system to a different technology, but not requirements or design by its definition. A further examination on the estimation accuracy of phase effort distribution in the three development types shows that the Transition phase has been overestimated in all projects as shown in Figure 6. The accuracy measure used is the relative error (RE) percentage, which is calculated by the equation of RE = (Estimated phase effort Actual phase effort)/actual phase effort, for each individual development phase. Further investigation is needed to explain the overestimation phenomena in transition phase in CSBSG dataset. 65

Error 5 4 3 - - Plan & Design Code T est T rans. The data indicates that software size data in CSBSG dataset roughly follows a normal distribution after taking log-transformation, consistent with the distribution reported in the ISBSG dataset [21]. Figure 8a depicts the average phase distribution of each size group. The CSBSG data does not show a clearly consistent decreasing trend in design and code phase, or an increasing trend in integration and test phase, as software size increase from smaller to larger scales. However, such trends are only observed between ultra-large group and super-ultra-large group. One possible reason might be that projects in CSBSG dataset were mostly completed in the past 3 years supported by advanced programming technologies, and it is likely for the size data to take the automatically generated code into account. However, fhis may again challenge on the validity of using LOC as the unit for effective software size. ReDev New Enhance Figure 6. Comparison of estimation accuracy The data also indicates that new development projects have much lower estimation error and hence more easier to estimate than the other two types for almost all phases, except the underestimation for the test phase. Moreover, design and code phases have been underestimated for the re-development projects. 4.4 Software Size It is noted that size metrics used in our study is Lines of Code. In this dataset, only 1 project recorded its size in Function Point. At the same time, another 3 projects from the same organization as this project are found, and they also use the same primary programming language, Java; moreover, these 3 projects have size records in both FP and LOC, and the ratio of LOCs per FP are all 53, which is consistent with the transformation ratio reported in SPR documentation [23]. In that case, transforming the size of this project into LOC metrics is considered to be reasonable. 6 5 4 3 P & R 2 KLOC 8 KLOC 32 KLOC 128 KLOC 512 KLOC >512 KLOC Figure 8a Comparison among difference software sizes scale in LOC 6 5 # of Projects 30 25 20 15 10 5 0 28 24 12 6 4 1 <2 2~8 8~32 32~128 128~512 >512 KSLOC 4 3 Plan & XXS M1 M2 L XL XXL XXXL Figure 8b Comparison among difference software sizes in FP scales Figure 7 Distribution of software size First of all, all the projects are divided into six groups: Small, Intermediate, Medium, Large, Ultra-Large, and Super-Ultra-Large according to the ranges adapted from COCOMO model, as shown in Figure 7. About 69% of projects in our dataset are medium to large scale projects, with 24% of projects are ultra-large scale or even bigger, while the small project ratio is relative as low as 7%. We then considered to use offunction Point (FP) as alternative software size metric. Following backfiring ratios in [23] and FP sizing scales in [25], we re-group all 75 projects into 7 groups, from Extra-extra-small to Extra-extra-extra-large. The effort distribution comparison of the 7 groups is illustrated in Figure 8b. According to this categorization, the phase distribution variations in each group are listed in Table 7. 66

Table 7 Summary of phase distribution variation by software size in FP scales Size N Plan & (KLOC) XXS 1 7.72% 9.42% -18.88% -2.56% 4.3 M1 11-1.18% -0.66% -3.29% 6.55% -1.42% M2 27 2.4 0.74% -4.25% 2.74% -1.63% L 17 1.47% -0.01% -0.08% -2.64% 1.27% XL 9-5.9-0.55% 12.83% -5.69% -0.69% XXL 6-5.32% -3.66% 15.74% -4.87% -1.89% XXXL 4 0.16% 1.21% -9.69% -4.56% 12.87% Size Plan & N (KLOC) XXS 1 7.72% 9.42% -18.88% -2.56% 4.3 M1 11-1.18% -0.66% -3.29% 6.55% -1.42% M2 27 2.4 0.74% -4.25% 2.74% -1.63% L 17 1.47% -0.01% -0.08% -2.64% 1.27% XL 9-5.9-0.55% 12.83% -5.69% -0.69% XXL 6-5.32% -3.66% 15.74% -4.87% -1.89% XXXL 4 0.16% 1.21% -9.69% -4.56% 12.87% The data shows that as the software size grows from small to medium, more development effort should be allocated to code and test phases; as the software size grows from medium to large, more development effort should go to plan and requirement and code phases; when the software size is extra large, more attention should be paid to design, code, and test phases. 4.5 Team Size Maximum team size is another metric collected in CSBSG dataset. It is also reported in [24] that the maximal number of members involved in the whole project life-cycle is easier to measure than average team size. An analysis of its influence on phase distribution is performed to explore possible causality. Since one project contains missing data on max team size, the other 74 projects are included in this analysis. The average phase distribution ratio results are depicted in Figure 9. And the phase distribution differences from the total average value in each group are listed in Table 8. 45% 4 35% 3 25% 15% 5% P & R 5 10 15 >15 Figure 9 Average phase distribution for different team size Table 8 Summary of phase distribution differences from Mean Effort% for whole dataset by team size Team size N Plan & 5 31 2.22% -0.9 0.16% -1.85% 0.36% 10 27-1.45% 0.23% 0.51% 0.3 0.4 15 13-2.26% 0.66% -0.15% 3.16% -1.42% >15 3-2.76% 1.27% 0.64% 3.52% -2.67% The data indicates a consistent relationship between maximum team size and phase distribution pattern. It shows a distinguishable emphasis on Test phase with the growth of team size.. 5. ANALYSIS The discussions in previous section contain some indications to the research questions. However, to further elaborate more statistical proofs about factors influencing phased distribution and their relation to the phase distribution patterns, we performed more statistic analysis on our dataset. 5.1 Factors Influencing d Distribution ANOVA analysis is used to examine to what a degree the variance of each phase effort distribution percentage is explained by class variables, i.e. development type, software size, and team size. The variable of development life cycle was excluded from this study because possibly biased data from projects that can not easily mixmatch their non-waterfall-based data into CSBSG structure, as well as its correlation with software size, as discussed in section 4.2. The data summarized in Table 9 shows the degree of variance explained by an influencing factor for the variation in individual phase effort distribution. For example, it shows that FP software size is the major metric to be considered when making distribution decisions for all development phase with the highest values in all phases. It also indicates that FP is a stronger predictor in measuring effective software size and predicting development effort. This confirms the software-scale-sensitive distribution scheme in COCOMO model. Development type is another major factor to be considered in determining appropriate adjustment for code and test phases. However, team size factor is a less significant factor to influence effort distribution on all phases. Table 9 Summary of ANOVA results Software size Team Distribution DevType LOC scale FP scale size P&R% 3.96% 14.8 13.65% 5.5 Design% 2.57% 3.67% 3.36% 0.62% Code% 12.22% 7.93% 20.56% 0.02% Test% 7.47% 7.16% 14.59% 3.05% Trans.% 2.72% 9.36% 22.45% 1.52% Through such analysis, parametric model can be developed in order to sophisticatedly consider these influencing factors, and predict appropriate amount of adjustment to be made against the nominal quantities offered by current methods for estimation and benchmarking purposes. 5.2 Guidelines for Allocation Though it is infeasible to deploy a unified distribution scheme over different types of projects, analyzing empirical effort data helps to develop understanding to variations and possible causes of phase 67

effort distribution, identify most frequently presented effort distribution patterns, and conclude insightful suggestions to lead to improved project planning practices. Concluded from the observations and analysis results of this study, the following general guidelines are provided for further discussion, adoption, and validation by software estimation and management practitioners, esp. in Chinese software industry: 1) Analysis on determining reasonable phase effort distribution quantities should be performed in the cost estimation process. 2) CSBSG data analysis shows a waterfall-based phase distribution scheme as: 16.14% for plans and requirements phase, 14.88% for design phase, 40.36% for code phase, 21.57% for test phase, and the other 7.06% for transition phase. 3) CSBSG data shows that iterative development projects have allocation intensity on code (+7.42%) and transition (+2.67%) phases compared to median values of the overall dataset. 4) FP-based software size and development type are two major factors to be considered when adjusting effort allocation. 5) CSBSG dataset shows that for enhancement type of projects, percentage of development effort in code phase decreases by 10.67%, and that for test phase increases by 5.4%. 6) CSBSG dataset indicates that as software size dramatically grows, distribution has an intensive focus on both code and test phases. 7) CSBSG dataset indicates that team size is not a significant factor which may cause phase effort distribution variation. However, it shows a distinguishable emphasis on Test phase with the growth of team size 5.3 Threats to Validity Two major threats to external validity of our study are: 1) Data representativeness. The dataset used in our study was precollected by CSBSG through its member organizations. Hence, the data may represent software organizations with above average performance, because otherwise they are rarely willing to submit data even if they have collected such historical records. 2) Data quality. On the other hand, the data collection process is likely to bring in individual biases or participant error which may involve mis-reporting or double counting issues. Even though we have gone through strict data clean and exclusion criteria, this still remains a big risk to the trustworthiness of this study. The major threat to internal validity is the generalization of the empirical findings. Given that all projects are from domestic China mainland, in which many external and internal conditions might differ greatly from oversea projects, the findings will be more reliable and appropriate for projects from a similar setting.. In addition, for the issue of sizing metrics, one may question the way how each organization count the lines of code. In fact, we cannot figure out that in detail, and it really leaves us a question expected to be further explore in the future work. However, as we know, since the scare resource of expert in Function Point Method, most organizations still choose the relative direct and original counting method to measure the software size. Another issue has to do with the interpretations following the analysis on each influencing factor. Most interpretations are compared with COCOMO findings and only with respect to individual factors, i.e. development life cycle, development type, software size, and team size. Possible correlations among multiple factors are not studied and may lead to counter-intuitive findings. Moreover, analysis and exclusion of potential outliers from the subject is not included in the current study. This could also reduce the validity of this study. 6. CONCLUSION AND FUTURE WORK Poor effort estimation and allocation comes from lack of recognition to process variations and lack of understanding to process effort distribution patterns. However, limited amount of studies are found to date to develop in-depth exploration what factors cause the variations in effort distribution. The work performed within the study aimed to examine the effort distribution profile using 75 project data from a certain number of Chinese software organizations to gain additional understanding about the variations and possible causes of waterfall-based phase effort distribution. The phase effort distribution patterns were compared with that from the COCOMO model, and further analysis examines the relationships between the distribution patterns and some significant factors including development life cycle, development type, software size, and team size. It identifies distinguishable phase effort distribution similarities and differences, and provided guidelines for improving future estimation and allocation practices. Such empirically-based information offers various beneficial outcomes in developing better understanding to development phase focuses and predicting cost estimation and phase allocation. Future work includes investigation and extension of current study to address the following issues: Continuing on similar analysis to search for other significant influencing factors, possible candidate including requirement volatility and business area; Strengthening the findings through more thorough statistical analysis including interrelationship modeling among multiple factors and proper outlier identification and handling; Using linear regression to construct a parametric model of a consolidated set of predicting factors to calculate phase sensitive effort distribution quantities for a given project setting; Extending the scope of current analysis to cover phase schedule distribution analysis. 7. ACKNOWLEDGMENTS We would like to thank the Chinese Systems and Software Process Improvement Association for their data support of this study and the anonymous reviewers for their constructive suggestions. This work is supported by the National Natural Science Foundation of China under grant Nos. 60573082,90718042; the National Hi- Tech Research and Development Plan of China under Grant No. 2006AA01Z182, 2007AA010303; the National Key Technologies R&D Program under Grant No. 2007CB310802. 68

8. REFERENCES [1] Boehm, B. and C. Abts, Software Development Cost Estimation Approaches-A Survey. Annals of Software Engineering, 2000. 10(1-4): p. 177-205. [2] Jorgensen, M. and M. Shepperd, A System Review of Software Development Cost Estimation Studies. IEEE Transaction on Software Engineering, 2007. 33(1): p. 33-53. [3] Basili, V.R.: Software development: a paradigm for the future. Proceedings of the 13th Annual International Computer Software and Application Conference, (1989) 471-485. [4] Feller, J. and Fitzgerald, B.: A Framework Analysis of The Open Source Software Development Paradigm. Proceedings of the 21st International Conference in Information Systems (ICIS 2000). (2000), 58-69. [5] The Standish Group. 2004 the 3 rd Quarter Research Report, (2004). http://www.standishgroup.com [6] Boehm, B.W., et al.: Software Cost Estimation with COCOMO II. Prentice Hall, NY(2000) [7] Reifer, D.: Industry software cost, quality and productivity benchmarks, software. Tech News 7(2) (July 2004) [8] Kroll, P.: Planning and estimating a RUP project using IBM rational SUMMIT ascendant. Technical report, IBM Developerworks (May 2004) [9] QSM Inc.: The QSM Software Almanac: Application Development Series (IT Metrics Edition) Application Development Data and Research for the Agile Enterprise. Quantitative Software Management Inc., McLean, Virginia, USA (2006) [10] Heijstek, W. and Chaudron, M. R. V.: Effort distribution in model-based development. 2nd Workshop on Model Size Metrics (2007) [11] Boehm, B.W.: Software Engineering Economics. Prentice Hall, (1981) [12] Kruchten, P.: The Rational Unified Process: An Introduction. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (2003) [13] Wohlin, C.: Distribution Patterns of Effort estimation. EUROMICRO Conference (2004) [14] Norden P.V.: Curve Fitting for a Model of Applied Research and Development Scheduling. IBM J. Research and Development, Vol. 3, No. 2, (1958), 232-248. [15] Putnam, L. and Myers, W. (1992), Measures for Excellence, Yourdon Press Computing Series. [16] He, M., et al.: An Investigation of Software Development Productivity in China. ICSP 2008, (2008) 381-394 [17] SEER-SEM. http://www.galorath.com/index.php [18] Peleg Yiftachel, et al.: Resource Allocation among Development s: An Economic Approach. EDSER 06, May 27, 2006, Shanghai, China, (2006) 43-48 [19] Huang, L., Boehm, B.: Determining how much software assurance is enough?: a value-based approach. In: EDSER 05: Proceedings of the seventh international workshop on Economics-driven software engineering research, NY, USA, ACM Press (2005) 1 5 [20] Yiftachel, P., Peled, D., Hadar, I., Goldwasser, D.: Resource allocation among development phases: an economic approach. In: EDSER 06: Proceedings of the 2006 international workshop on Economics driven software engineering research, New York, NY, USA, ACM Press (2006) 43 48 [21] Jiang, Z., Naudé, P. and Comstock, C.: An investigation on the variation of software development productivity. International Journal of Computer, Information, and Systems Sciences, and Engineering, Vol. 1, No. 2 (2007) 72-81 [22] ISBSG Benchmark Release 8. http://www.isbsg.org [23] SPR Programming Languages Table 2003, Software Productivity Research. ttp://www.spr.com [24] Agrawal, M., Chari, K.: Software Effort, Quality and Cycle Time: A Study of CMM Level 5 Projects. IEEE Transactions on Software Engineering, Vol. 33, No. 3 (2007) 145-156 [25] Software Measurement Services Ltd. Small project, medium-size project and large project : what do these terms mean? http://www.measuresw.com/library/papers/rule/rulesrelati vesizescale%20v1b.pdf [26] http://www.csbsg.org 69