Effort Estimation of Software Development Projects with Neural Networks



Similar documents
Quantifying the Influence of Volatile Renewable Electricity Generation on EEX Spotmarket Prices using Artificial Neural Networks.

Implementation requirements for knowledge management components into ERP Systems: Comparison of software producers and companies

A Decision Support System for the Modelling of Asset Prices, Option Prices, and Volatility: An Application of Artificial Neural Networks

Thema: Hybrid IT-Project Management: Design, Application and Analysis

Evaluation of Selection Methods for Global Mobility Management Software

Boom and Bust Cycles in Scientific Literature A Toolbased Big-Data Analysis

A Process Model for Data Warehouses Integration to Enable Business Intelligence: An Applicability Check for the Airline Sector.

Cost-Benefit Analysis of Videoconferencing and Telepresence Systems in Virtual Project Environments: a Holistic Approach. D i p l o m a r b e i t

Development of a Half-Automated Product Database for Search Engine Optimization. Masterarbeit

Development of a Portal for HR Executives to Enable Digital Personnel Files

Empirical Analysis of Leadership and Social Learning Effects on Employees' Information Security Behaviour. Masterarbeit

Affiliate Marketing Technology, Opportunities and Challenges

Comparing Social Media Sites: A Facebook Case Study about Employer Branding. Bachelorarbeit

Requirements and Challenges for the Migration from EDIFACT-Invoices to XML-Based Invoices. Master Thesis

E-Commerce Opportunities for a Commercial Vehicle Industry System Supplier. Bachelorarbeit

Business Models for Cellular Phones with RFID-Modules

Application Outsourcing: The management challenge

Risk Management in Company Pension Schemes

Project Management in Marketing Senior Examiner Assessment Report March 2013

9 Keys to Effectively Managing Software Projects

Customer Intimacy Analytics

CHAPTER 1 INTRODUCTION

Document management concerns the whole board. Implementing document management - recommended practices and lessons learned

PG Diploma Business and Management

Towards a Blended Workforce - the Evolution of Recruitment Process Outsourcing (RPO) Models

Client Relationship Management When does an organisation need to formalise its processes?

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Quick Guide: Meeting ISO Requirements for Asset Management

ETNO Reflection Document in reply to the EC consultation on Future networks and the Internet early challenges regarding the Internet of things

Related guides: 'Planning and Conducting a Dissertation Research Project'.

Mining Network Relationships in the Internet of Things

The Cyber Threat Profiler

Cloud Infrastructure Security Management

Application Lifecycle Management

Project Planning and Project Estimation Techniques. Naveen Aggarwal

Value, Flow, Quality BCS PRACTITIONER CERTIFICATE IN AGILE SYLLABUS

Customer Engagement & The Cloud

How to introduce maturity in software change management $

Change Management Overview

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

TERRITORIAL PLANNING FOR THE MANAGEMENT OF RISK IN EUROPE

Basel Committee on Banking Supervision. Working Paper No. 17

MASTER OF SCIENCE (MSc) IN ENGINEERING (SOFTWARE ENGINEERING) (Civilingeniør, Cand. Polyt. i Software Engineering)

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

INEUM Kurt Salmon INE_06_0409_Logo_CMYK 14/12/2010. Ce fichier est un document d exécution créé sur Illustrator version CS3. K100

Advantage HCM for Oil and Gas An affordable workforce management solution for improved corporate performance

TEACHING OF STATISTICS IN NEWLY INDEPENDENT STATES: THE CASE OF KAZAKSTAN

DESIGNING A POSTGRADUATE CURRICULUM IN INFORMATION SYSTEMS: A GREEK CASE

Procurement Programmes & Projects P3M3 v2.1 Self-Assessment Instructions and Questionnaire. P3M3 Project Management Self-Assessment

Analysis of Business Models for Electric Vehicles Usage

Data Quality and The Decision Model: Advice from Practitioners

Compensation accounts for nearly 70 percent

A Fragmented Approach To CRM: An Oxymoron? By Glen S. Petersen

The Massachusetts Open Cloud (MOC)

5.5 QUALITY ASSURANCE AND QUALITY CONTROL

Position of leading German business organisations

Software project cost estimation using AI techniques

CHAPTER - 4: QUALITY STANDARDS: THE MISSED OPPORTUNITIES

Bringing wisdom to ITSM with the Service Knowledge Management System

Energy Efficient Systems

INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.

A Recommendation Framework Based on the Analytic Network Process and its Application in the Semantic Technology Domain

Business Performance Management

Agile Project Portfolio Management

SIX (AVOIDABLE) BLIND SPOTS THAT CRIPPLE TALENT MANAGEMENT ANALYTICS INITIATIVES

GPS Fleet Tracking Solutions - How to Improve Performance and Efficiency

WHITE PAPER. Portland. Inventory Management. An Approach to Right-sizing your Inventory. By Andrew Dobosz & Andrew Dougal January 2012

Enabling Self Organising Logistics on the Web of Things

Executive Summary. Summary - 1

Introduction. Chapter 1

Preparing for the Goal

Enterprise Performance Management. Getting the value from your digital investments

Performance Management Systems: Conceptual Modeling

All available Global Online MBA routes have a set of core modules required to be completed in order to achieve an MBA.

The Framework for Strategic Plans and Annual Performance Plans is also available on

How To Find Local Affinity Patterns In Big Data

The Scientific Data Mining Process

!!!!!! "#$%&'&()%*+,-))!.'',(+-$(/#!0%,%-)%!.1$/2-$(/#!.!3%)$!4&-+$(+%!!!!

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Measurable Software Quality Improvement through Innovative Software Inspection Technologies at Allianz Life Assurance

INTEGRATED SERVICE ARCHITECTURE FRAMEWORK (ISAF) FOR ITIL V3

Data Migration for Legacy System Retirement

CHAPTER 4 DESIGN AND DEVELOPMENT OF RESPONSIVE SUPPLY CHAIN RISK MANAGEMENT MODEL

Multiple Kernel Learning on the Limit Order Book

37 Marketing Automation Best Practices David M. Raab Raab Associates Inc.

Blue Saffron Managed Services : The Customer Experience

The Journey into Speech Analytics

Regulations of Studies of the International Master Degree. "Electrical Power Engineering (EPE)" Darmstadt University of Technology

Marketing (MSc) Vrije Universiteit Amsterdam - Fac. der Economische Wet. en Bedrijfsk. - M Marketing

Marketing (MSc) Vrije Universiteit Amsterdam - Fac. der Economische Wet. en Bedrijfsk. - M Marketing

THE JOINT FUNDING BODIES REVIEW OF RESEARCH ASSESSMENT RESPONSE FROM THE ROYAL COLLEGE OF PATHOLOGISTS

Intelligent Systems: Unlocking hidden business value with data Microsoft Corporation. All Right Reserved

Agile for Project and Programme Managers

Business Intelligence for Banking

Meeting the Challenges of Database Management. White Paper. Page 1

Scholarship Programme

Content. Management Summary... 3

A Short review of steel demand forecasting methods

STAGE 1 COMPETENCY STANDARD FOR PROFESSIONAL ENGINEER

Statistical Data Quality in the UNECE

Transcription:

Effort Estimation of Software Development Projects with Neural Networks Diplomarbeit zur Erlangung des Grades eines Diplom-Ökonomen der Wirtschaftswissenschaftlichen Fakultät der Leibniz Universität Hannover vorgelegt von Name: Petersen Vorname: Marten Geb. am: 01.11.1981 in: Husum (Nordsee) Erstprüfer: Prof. Dr. Michael H. Breitner Hannover, den 28.09.2007

Table of Contents Table of Figures... 3 List of Tables... 4 Abbreviations... 6 1. Introduction: Business Value of Software Metrics... 9 2. Effort Drivers of Software Projects... 12 2.1. Definitions and Characterisation of a Software Development Project... 12 2.2. Software Size Metrics... 18 2.2.1. Lines of Code... 20 2.2.2. Function Point Metrics... 21 2.3. Project-specific Factors... 38 2.4. Organisation-specific Factors... 42 2.5. Human Factors... 44 3. Methods of Effort Estimation... 47 3.1. Categorisation of Methodologies... 47 3.2. Heuristic Methods... 51 3.2.1. Expert Estimation... 51 3.2.2. Delphi Method... 52 3.2.3. Analogy Method... 53 3.2.4. Other Methods... 54 3.3. Parametric Models... 55 3.3.1. The Putnam Model... 55 3.3.2. The Cost Constructive Model... 58 3.3.3. Expert Systems... 63 3.4. Data Mining Methods... 65 3.4.1. Statistical Regression... 65 3.4.2. Artificial Neural Networks... 68 3.5. Predictive Power Comparison of Estimation Methodologies... 72 4. Case study: Analysing Development Project Data using Neural Networks... 78 4.1. Data... 78 4.2. Methods... 83 4.3. Analysis... 85 4.4. Findings... 98 1

5. Conclusions and Outlook: Effort Estimation Methodologies as a Key Factor in Successful Software Development... 102 Bibliography... 105 A. Appendix... 112 A1. Descriptive Statistics of the BMRS DB... 112 2

1. Introduction: Business Value of Software Metrics Today s business world is increasingly shaped by globalisation, virtualisation and business networks. Whereas in 2007 only 2% of the Gross Domestic Product [GDP] are to be generated in the project economy this figure is expected to rise to 15% by 2020 1. The ubiquitous presence of high-speed networks and technological sophistication enables dynamism and flexibility and demands adaptable and highly interconnected software applications. Software due to its cross sectional character becomes a means to an end as opposed to an end in itself and therefore an innovation enabler. In fact software being part of products in the service consumption and investment economy has become an invisible technology to the consumer. In this highly dynamic and fast moving business world the reliance of business processes on information technology [IT] increases. IT itself enables interconnectivity and seamless processes between and within organisational units. The need for new or adapted software as the facilitator of information collection, processing and transfer is therefore greater than ever, whilst the underlying complexity of systems increases. One of the challenges of the software industry is the efficient allocation of resources in an effective manner. As the software industry works merely project based, the challenge is to complete projects within time and budget as well as meet the functionality and quality demanded by the customer. There is an apparent trade-off but also a correlation between these dimensions which practitioners usually call the magic triangle of project management 2, i.e. a trade-off between time to deployment and development costs. Some call the creation of software an art in itself, others a rather mechanical task. Whichever view one takes, conducting a software development project [SDP] and producing an immaterial good cannot or at least only to a very limited extent be automated. Mental work is required and consequently makes up for the majority of incurred costs. Obtaining knowledge of the required resources as early on as possible in the project lifecycle is critical to success. Therefore effort estimates and consequently cost estimates of SDP s gain a strategic role in a range of situations for the organisation or the project itself. For the organisation, the allocation of the resource pool is a crucial task as a high utilisation enables high revenue. Estimates can furthermore act as a catalyst for innovation, as budgets can be allocated towards the more productive or more profitable project teams, profit centres or the like. Projects benefit of the estimates as a tool to justify the use of those resources in the 1 Cf. Jan Hofmann (2007). 2 Cf. Poensgen (2005) p. 675. 9

course of the project and as a measuring stick for success. An estimate based on software metrics as explored in this thesis can furthermore point out the trade-off between time, cost, quality and scope also known as the magic square of project management. In a bid situation, estimates act as a risk reduction for the bidding strategy as an overpriced bid results in a loss of the contract and bidding too low means losing profitability for the organisation. Estimates also have psychological implications. Following Brooks Law which was stated by Fred Brooks in his book The Mythical Man-Month 3 additional manpower in a project that is likely to run out of schedule will itself have a negative impact on the timeliness. The reasoning behind this hypothesis is that new workers have to familiarise themselves with the project and that the need and allocated time for communication has to increase. Overestimating effort has equally negative effects on the performance of the project. Parkinson s Law states that work expands so as to fill the time available for its completion 4. People involved in a project will stretch their work by reducing their productivity in order to deliver in time. Therefore the estimate becomes a self fulfilling prophecy. Another effect of overestimates is called gold-plating, i.e. adding unwanted or unnecessary functionality to the product. Estimates can be further complicated by what is referred to as the estimation paradox. In the early phases of a project where an estimate is needed most, the prerequisites, i.e. underlying documents and specifications are lacking. Once it has been completed, the estimate can be made with absolute accuracy, i.e. effort figures are available and estimates become obsolete 5 6. Despite the crucial nature of estimates it is a field where the software industry has a huge potential for improvement. Surveys show that 60-80% of SDP are exceeding their budgets by 30-40%, mostly due to overly optimistic estimates 7. There are a variety of methods to estimate effort, although the one consistent and appropriate method is yet to be found despite significant research over the last 30 years. Most organisations rely on expert based estimates and do not employ statistically founded and more objective methods. Methods are not 3 Cf. Brooks (2006). 4 Cf. Parkinson (1955). 5 Cf. Meli in Jones (2002) Early and Quick Function Point Analysis from Summary User Requirements to Project Management p. 427. 6 Cf. Ebert (1996). 7 Cf. Molokken, et al. (2003). 10

typically chosen from of a scientific rationale and therefore the business potential is not levied 8. This thesis aims to outline the potential of a company specific effort algorithm using the neurosimulator FAUN 9 and data supplied by IBM Global Services [IBM GS]. In the second chapter, the author seeks to give a structured overview over the defining characteristic of a SDP and of what factors influence the effort of SDP s, with an emphasis on software metrics. Next, the most common methods of effort estimation are outlined including traditional statistical methods as well as especially Artificial Neural Networks [ANN]. Their suitability for different environments is evaluated on the basis of existing research. In the fourth chapter the project data from IBM GS is examined and an effort prediction algorithm is computed employing the software tool FAUN. Effort drivers are examined and quantified and related to existing research. Finally the author concludes his findings and shows directions of further research and a broader application of ANN s in Software Engineering [SE]. 8 Cf. Moloekken-Oestvold, et al. (2004). 9 Cf. Breitner, et al. (2003). 11

5. Conclusions and Outlook: Effort Estimation Methodologies as a Key Factor in Successful Software Development Software development continuous to be a vital productivity driver and enabler of cost reduction, efficiency and automation in the business world. In recent times the economy has fundamentally evolved in character from a traditionally structured economy to a dynamic network of flexibly interlinked actors. It is the case today that software is needed to support and facilitate this flexibility, and thus SDP s gain strategic importance in the process of a prospering economy. Remarkably, SE is considered a science that has been compared to other engineering disciplines, yet has only recently received the attention needed to develop the best practices and examine the processes in software development. Whilst some still believe that programming is an art, others regard it as fully-fledged engineering discipline. However SDP s are a phenomenon that have the tendency to deliver less than promised, later than planned, using more budget than intended at a disappointingly low quality. One might blame the difficulty of translating business requirements into software concepts on the expectationdelivery gap, though the driver of effort and therefore costs lie in the interaction of a plurality of factors of project, organisation and people and can succinctly be summarised as generic complexity. This thesis has given a structured overview of characteristics that altogether make up the aforementioned complexity. When examining the drivers of effort, it is inevitable to have a size measure to relate the effort to. Special attention is therefore paid to software metrics that prove to be structured and relatively reliable methods to size the intended product of an SDP. The three outlined methods are all ISO certified and are today broadly applied across the industry. However all of the described methods are limited in their applicability to information based software and none of them can be regarded as a valid measure for real-time or algorithm intensive systems. Moreover, not only are they not convertible into one another, but they all require interpretation of the counting rules to judge upon the diversity of software applications. There is no silver bullet in software size metrics, though the existing approaches are better than a blind flight. Whereas software size metrics give an indication of how large the end product will be, the effort spent on the way to get there is influenced by innumerable factors as the majority of a project is human brain work. Project-specific, organisationspecific and human factors are examined and related research is presented quantifying huge ranges of possible impacts on effort. These studies can only give indications of the relative importance of effort drivers and not the desired overarching picture. 102

Heuristic, parametric and data mining methods to estimate the effort of a project are then described and evaluated, revealing that there are considerable differences in the degree of detail, objectivity, required data and degree of sophistication of the algorithm. Heuristic methods have their strengths in the relatively low effort spent on the estimation and the high degree of human creativeness and experience involved. The parametric models try to reflect the complex reality of SDP s but are limited in their applicability to the environment and the software processes they are developed under. Furthermore the multiplicity of characteristics in these models can give the false impression of accuracy. Parametric models are a suitable supplement to heuristic methods if no larger project database is available and can be used for plausibility checks. Once a sufficient project database of 50-200 SDP s, depending on the variability of the environment, has been collected, statistical analysis can lead to more accurate organisation specific solutions. In particular, LSR is widely available and integrated in standard software packages and has the advantage of being highly transparent. Nevertheless, the implications and limitations of multivariate LSR need to be understood. ANN s as the most important subsection of AI depict structures that are adapted by the human brain. The topology of an ANN allows for high complexity of the underlying phenomenon. Through optimisation or training, ANN s are able to learn the relationship of inputs and outputs and are able to cope with a high level of noise in the data. However a relatively large amount of data is necessary for the training and validation of ANN s which only very few companies are able to obtain/ acquire. Furthermore the knowledge inherent in an ANN cannot easily be extracted, which raises problems when applying them to a business environment. The comparison of the outlined methods on the basis of existing research shows that there is no single best methodology for effort estimation. The development of company specific algorithms is however favourable over all other methods. The author was delighted to gain access to the BMRS of IBM GS, a database containing over 1500 SDP s with a large amount of project characteristics describing effort, duration, topology, quality and other factors of the project. The effort drivers not represented in this database are the human factors. Despite the relevance of experience, skill as well as other human related characteristics, the analysis is conducted without these. A structured analysis of the data with over 2000 training runs reveals that despite the large datasets, the prediction accuracy of the best ANN is comparatively low and it is advised to be used with caution. An in depth analysis shows that the ANN estimates projects with certain characteristics more accurately than others. The ANN can be used to quantify trade offs with respect to the choice 103

of one of the characteristics, i.e. the choice of the primary database, and can thereby deliver business value. However the overall results are limited by the available characteristics and further research including human factors would be likely to increase the validity and integrity of the analysis. This thesis has shown that the complexity of an SDP can only be accounted for through a similarly complex estimation methodology. The availability of project data limits the variety of methods that can be applied; although company specific estimation algorithms have the potential to outperform any other parametric method. The panacea of effort estimation has not been found; neither the universal remedy. The scientific interest and diametrical findings in this field show that there is a lot of movement in this discipline. Nonetheless the author does not expect a breakthrough in research as company specific data repositories reflect confidential information about productivity and cost structures. It is therefore unlikely that other quality assured data repositories will be opened up for scientific research in the near future. It seems that for an individual company, the effort to build up and maintain a data repository for effort estimation is unreasonably high in comparison to using Expert estimation or Expert Systems. A repository including software size metrics can be used in other business functions, such as planning and controlling as well as project controlling. Notably in project controlling, it enables methodologies that track and compare project and cost progression, i.e. the Earned Value Method. For ANN s there is a wide spectrum of potential application and as information becomes instantly and available in overwhelming amounts, ANN s offer a suitable way of processing the information and extracting knowledge. Be it for example the prediction of share prices and their derivatives or the evaluation of medical studies, the applications are endless. Vital prerequisites for good results are a high quality data repository that includes all relevant factors of influence. Despite considerable attention in the academic society, training ANN s is still a try and error method and universally applicable methods still need to be developed. 104