More than functional insights from comparison among functional methods of software measurement

More than functional insights from comparison among functional methods of software measurement Martellucci Margherita, Cavallo Anna, Stilo Francesco Maria Sapienza Università di Roma, via del Castro Laurenziano 9, 00161 Rome, Italy. Abstract This paper suggests new findings from the comparison among the observed results of different functional evaluation methods. It discusses about the additional information that could come from the joint usage of more functional methods adopted to evaluate the software applications of the same Information System (IS). The ground of discussion comes from the counting results of the authors direct work as Certified Function Point Specialists (CFPS). In conjunction with the widely-accepted practice, they decided to adopt different evaluation methods for every maturity task of the software to evaluate. The joint usage of the different approaches has taken to results that generally confirm the consistency among the functional counts. The existence of only one, even though considerable exception take to look for reasons that can justify it. The authors assume that this deviation could give insights concerning also software quality requirements. This paper describes the main characteristics of the adopted methods, the counts resulting from their application, the deviations and the hypothesis under discussion.

2 1 Introduction Many researchers (Agresti 1981, Case 1986, Cooprider and Henderson 1989) claim for separate measures for the products and processes of Information Systems (IS) development. In particular, Case (1986) argues that it is important to use process and product measures because there is a potential conflict between the efficiency of the process and the quality of the product. Function Point (FP) metric is widely considered a measure of product (Albrect 1979, Cooprider and Henderson 1989, Buglione 2008). If the idea of the FP mission is rather univocal, it s not the same for the FP evaluation methods since there are quite different approaches. In this paper the authors refer to the FPA (IFPUG 2004), Early and Quick (E&Q), (Conte et al. 2004, Meli 1997 a, b) and Backfiring (BF) (Jones 1997) methods. The choice of the right method depends on the maturity level of the software to evaluate. This paper makes a comparison among the results observed applying these methods during the different maturity phases of the same software application. The assumption is that this comparison can also give insight concerning quality of the software production process. Different researches concern the reliability of functional methods evaluations (Kemerer 1991, Desharnais and Abran 2003, SPR 2006) but less are those ones that try to interpret the added information that could come from a high deviation among the counting results of different FP counting practices applied to the same software. The results showed in this paper come from the direct work of the authors as Certified FP Specialists (CFPS): they were committed for the functional evaluation of the IS applications during the different phases of their life cycle. In coherence with the level of software maturity where the evaluation was required they chose among the different evaluation methods here mentioned. The resulting counts generally showed a consistency among the different methods. Only in one case they noticed a considerable exception. The authors assume that this deviation depends on a change in quality of the software production process. This paper discusses the methods and counting practice and finds in the peculiarity of each one, the main reasons that take to this hypothesis. The results of the estimation work are shown to better understand the different evaluation methods and allow a comparison among them. The history of the case study adds further insight that corroborates the authors findings.

2 Background: the evaluation approaches and methods 3 The results that a functional measurement practice can gain during the life cycle phases of a software can be described like follows: at an early stage, there is a high level description of the users needs and intentions. These Initial Users Requirements could be the only document available. During this phase the counters can produce an early functional estimate. The measurement practice can be performed instead of the estimation practice when the Final Functional Requirements are available (IFPUG 2004). After the description of the original FPA method in 1979 by Albrecht (1979), variations of this method have been developed to improve the original method or extend its domain of application (Symons 2001). The measurement methods adopted for this case study can be described like follows: Function Point Analysis: in 1979 Albrecht (1979) designed FP metric and method for measuring software size. Later Albrecht and Gaffney (1983) improved this method. The FPA is based on the idea of measuring the size of the software delivered by quantifying the user requirements. The International FP Users Group (IFPUG) endorses FPA as its standard methodology for software sizing. In support of this, IFPUG maintains the FP Counting Practices Manual (CPM), the recognized standard for FPA (ISO-IEC 2007). The new releases of the CPM increase inter-counter consistency since they contain minor modifications and provide more clarifications of the existing FPA roles. Counting practice performed by CFPSs has been measured to range by about plus or minus 10% (Kemerer 1991). In addiction, FPA determines a revision of the user requirements that improves their consolidation and sharing (Battistata 2006) as well as the reliability of the measurement results. Backfiring: the BF technique (Jones 1997) involves a table of conversion ratios according to the type of programming language. It is based on the assumption that the IFPUG functional size can be converted to SLOC (Source Line Of Code) by multiplying the former with an average ratio figure derived from earlier project data. The accuracy of BF from logical statements is generally plus or minus 20% (Jones 2000). Basically, BF is counting the software statements using an automated tool, and then using a table with factors turning the statements into FPs. It has been considered an useful method for applications already implemented (Jones 2000). The measurement of SLOC determines a software inventory and an additional revision of the programming style and quality (SPR 2006). Early and Quick FP: size estimation would be necessary when we do not have enough information and early estimation methods to obtain it (Meli et al. 1997 a, b, 2000): as a consequence, the estimation model should be reliable, early in the life cycle, before the detailed requirements are elicited. The E&Q functional

4 size estimation technique (Meli 1997 a, b, Conte et al. 2004) aims to respond to contracts and plan in advance the software estimates. It maintain the overall structure and the essential concepts of standard functional size measurement methods and permits the use of different levels of detail for different branches of the system (multilevel approach): the overall global uncertainty level in the estimate (which is a range, i.e. a set of minimum, more likely, and maximum values) is the weighted sum of the individual components uncertainty levels. Finally, the technique provides its estimates through statistically and analytically validated tables of values. 3 Research method and motivation Case study research is the most common qualitative method used in Information Systems (Orlikowski and Baroudi 1991, Alavi and Carlson 1992). In particular, the research presented in this paper attempts to follow the guidelines suggested by Kitchenham et al. (2002), which includes specifying as much information about the organization, the participants and the experiment as possible, in addition to the complete experimental results. The rationale for including so much information is to ensure that the study is easy to replicate, and that the results are appropriately interpreted and transferred. The history of this case study has been illustrated through quantitative evidences of the functional software dimension which have been collected during the different phases of the software life cycle. The empirical data have been collected applying different methods measuring the functionality attributes. Case study research (as with experiments) relies on analytical generalization: in analytical generalization, the investigator is striving to generalize a particular set of results to some broader theory (Yin 2003). The procedure of selecting a new case for study will take the authors to replicate the condition experienced during this case study to verify if the same considerations do occur. 4 The case study history The functional evaluation of this case study concerned the development of a new IS for a PA of big dimensions. This PA has a national competence

with institutional functions. The software project referred only one PA department characterized by high complexity and time lags in the service supply. The main challenge of the new IS is to improve these aspects. The Software Company (SC) operated as an independent contractor developing complete solutions. Despite its medium dimension (about 15 employees) it operates on a national level. The human resources were allocated to the following four business roles: Engagement Manager and Client Responsibility (EM), Project Manager (PM), User Analyst/Designer (UAD) and Technical Programmer (TP). The SC had no formal estimation procedure; their projects had short development cycles and the development processes involved were ad-hoc. The SC has based the functional evaluations of software on external experts. The Team for the Functional Evaluation (TFE) operated as a third part. It was composed by a Manager of the evaluation team with client responsibility (M), two CFPSs and a Software Engineer (SE). This group had a direct responsibility towards both the PA and the SC. This position made the TFE independent from the interests of the two organizations. Since 2004 the TFE had evaluated four applications for the IS produced by the SC. The first one was commissioned in 2004; it was at an early stage, and the TFE had to work with a high level description of the users needs and intentions. The principal sources 1 of the evaluation process were: the Technical Offer (TO), the Preliminary Logical Model (PLM) of the Data Base and the Mock Up (MU) of the User Interfaces. The FPA was applied only for the functionalities documented by all the sources. Some other functionalities were described only by the TO and PLM. This led the TFE to the adoption of the E&Q method. The final count was composed by three Total FP values, obtained applying the three E&Q ranges: Min FP, Most Likely FP (MLFP) and Max FP. Every value was composed by a part calculated with the FPA and a part calculated with the E&Q. The following data describe, in percentage, the composition of the results; from the range Min FP to the range Max FP the E&Q FPs counts increases; as a consequence the percentage of E&Q increases while the percentage of IFPUG FP decreases: Table 4.1. The composition of the results for the first evaluation process with E&Q and FPA methods APPL 1 Min FP MLFP Max FP IFPUG FP 68 % 61 % 53 % E&Q FP 32 % 39 % 47 % Total FP 100 % 100 % 100 % 5 1 Other sources concerns informal contacts with the PM by telephone, e-mail, work meetings.

6 In 2006 it was required to measure the functionality really implemented. The available source code led the TFE to choose the BF method. This method determined a revision of the programming style and quality. The outcome of the source code analysis gave several totals for statements in several languages, and the total of FPs confirmed the reliability of the previous estimate. In 2008 three new applications were commissioned, for the functional evaluation. In the meantime there was a change of the development group: PM, UAD and TP were substituted by new resources. For all the applications the principal sources of functional measurements were the Final Functional Requirements, the Logical Model and Design of the Data Base, the technical project of the Data Base, the user manual. The first of these applications was already implemented. This led the TFE to choose the BF method. According to the TFE the first outcomes from BF clearly overcame the expected functional dimension. They also noticed a change in the programming style and this was the first aspect to which this deviation was ascribed. Thus the TFE decided to evaluate this application with the FPA. The results confirmed a high deviation from the BF results. As a consequence also the applications that followed were counted with the FPA. Table 4.2 shows the deviations, in percentage, between the FP counts resulted from BF and the FP counts calculated respectively for Application 1 (with FPA and E&Q) and for Application 2 (with FPA). Every value of the table has been calculated like follows: X 1 = (BF 1 Min FP) * 100 /BF 1 (4.1) Where Min FP is the Total FP count (IFPUG plus E&Q FP), in absolute value, obtained applying the Min FP range; BF 1 is the total FPs from BF for the Application 1; X 1 is the value of the deviation in percentage. This formula refers to the Min FP range for application 1; it has been replicated for MLFP and Max FP ranges showed in table 4.1 for Application 1 and between BF and FPA counts calculated for Application 2: Table 4.2. The deviations between the BF and the FP counts, from E&Q and FPA, for the Application 1 and 2 (BF 1 Min FP)/ BF 1% (BF 1 MLFP)/ BF 1% (BF 1 Max FP)/ BF 1% APPL 1 46% 24% -7% (BF 2 IFPUG FP)/ BF 2% APPL 2 68 % For Application 1 the data suggest that the best FPs estimate is that one between the ranges MLFP and Max FP since the deviation is minimal

between these two values. Moreover the average of the three deviations is equal to 21% that is very near the percentage of Jones (2000). On the contrary, for Application 2 the data says that BF has overestimated the FPA count of plus 68% that is BF has determined an error 48% more than the 20% said by Jones (2000). 7 5 Conclusions and perspectives This case study has shown a joint usage of more dimensional values with the same explicit objective i.e. the FP evaluation. The author contribution refers to the existence of an implicit objective coming from the comparison among the resulting counts supplied by the different functional evaluation methods. The relevant deviation of this case study is between the BF count and the FPA count for application 2. Both methods calculate FPs but the first one through the analysis of SLOC (physic measure) while the second one through the analysis of the required functionalities (logic measures). As a consequence programming style and coherence of requirements are respectively, even though indirectly, under analysis. If the manual counting practice of FPA takes to a revision and improvement of requirements it is not the same for BF since it is performed for software usually already implemented and tested; moreover it makes use of automatic tools. Since the BF conversion values have been calculated for project that responds to certain standard in the project style, the authors assume that when the BF count deviate from the FPA count more than what is expected, i.e. plus o minus 20% according to Jones (2000), this can be justified by a distance from standards of coding style in software production. The awareness of this additional information could lead, at least, to the following advantages: the need of evaluation of at least two software aspects, the first concerning quantity (the functional dimension) and the second one concerning quality (the coding style) could be covered. In addition it could take customer and supplier of software to make attention on software coding style also when product objectives are particularly stressed by contractual obligations. References Agresti W (1981) Applying Industrial Engineering to the Software Development Process. In: Proc. IEEE Fall COMPCON, IEEE Computer Society Press, Washington DC, pp 264-270

8 Alavi M, Carlson P (1992) A review of MIS research and disciplinary development. Journal of Management Information Systems 8(4):45-62 Albrecht AJ (1979) Measuring Application Development Productivity. In: Proc. Joint SHARE/GUIDE/IBM Application Development Symposium, pp 83-92 Albrecht AJ, Gaffney JE (1983) Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation. IEEE Transactions on Software Engineering, SE-9(6):639-648 Battistata A (2006) Esperienza aziendale di introduzione di una metrica dimensionale del software. In: Franco Angeli (ed) Metriche del software Esperienze e ricerche, Milano Buglione L (2008) Some thoughts on Productivity in ICT projects. WP-2008-2, White paper 1.2 edn. Case A (1986) Computer - Aided Software Engineering. Database 17(1):35-43 Conte M, Iorio T, Meli R, Santillo L (2004) E&Q: An Early & Quick Approach to Functional Size Measurement Methods. In: Proc. of Software Measurement European Forum (SMEF), Rome Cooprider J, Henderson J (1989) A Multi-Dimensional Approach to Performance Evaluation for IS Development. In: MIT Libraries, CISR WP 197, Sloan WP 3103-89, Cambridge. Desharnais JM, Abran A (2003) Approximation Techniques for Measuring Function Points. In: Proc. of the 13th Inter. Workshop on Software Measurement (IWSM 2003), Springer-Verlag, pp 270-286 Montréal. IFPUG (2004) Function Points Counting Practices Manual - Release 4.2. Westerville, Ohio ISO-IEC (2007) ISO-IEC 14143-1: Information technology - Software measurement - Functional size measurement - Definition of concepts. Jones C (1997) Applied Software Measurement, Assuring Productivity and Quality. 2nd edn. McGraw Hill, New York Jones C (2000) Software Assessments, Benchmarks and Best Practices. Addison-Wesley, Canada Kemerer CF (1991) Reliability of Function Point Measurement: A Field Esperiment. MIT Sloan School Working Paper 3192-90-MSA, vol 3 Kitchenham B, Pfleeger SL, Pickard LM, Jones PW, Hoaglin DC, El Emam K, Rosenberg J (2002) Preliminary guidelines for empirical research in software engineering. IEEE Transactions on software engineering 28(8):721-734 Meli R (1997 a) Early and Extended Function Point: A New Method for Function Points Estimation. In: IFPUG-Fall Conference, Scottsdale, Arizona, USA Meli R (1997 b) Early Function Points: A New Estimation Method for Software Projects. In:. Proc. of ESCOM 97, Berlin Meli R, Abran A, Ho VT, Oligny S (2000) On the Applicability of COSMIC-FFP for Measuring Software Throughout Its Life Cycle. In: Proc. Of Escom-Scope, Germany Orlikowski WJ, Baroudi JJ (1991) Studying Information Technology in Organizations: Research Approaches and Assumptions. Information Systems Research (2), pp 1-28 SPR (2006) SPR Programming Languages Table, Version PLT2006b. Software Productivity Research LLC Symons C (2001) Come Back Function Point Analysis (Modernized) - All is Forgiven!. In: Proc. of the 4th European Conf. on Software Measurement and ICT Control, FESMA-DASMA, Germany, pp 413-426