Efficiency in Software Development Projects



Similar documents
DEA implementation and clustering analysis using the K-Means algorithm

Hybrid Data Envelopment Analysis and Neural Networks for Suppliers Efficiency Prediction and Ranking

The efficiency of fleets in Serbian distribution centres

Abstract. Keywords: Data Envelopment Analysis (DEA), decision making unit (DMU), efficiency, Korea Securities Dealers Automated Quotation (KOSDAQ)

AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS

Gautam Appa and H. Paul Williams A formula for the solution of DEA models

Gerry Hobbs, Department of Statistics, West Virginia University

Data Mining Part 5. Prediction

Operational Efficiency and Firm Life Cycle in the Korean Manufacturing Sector

A Hybrid Decision Tree Approach for Semiconductor. Manufacturing Data Mining and An Empirical Study

REPORT DOCUMENTATION PAGE

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market

Assessing Container Terminal Safety and Security Using Data Envelopment Analysis

Clustering-Based Method for Data Envelopment Analysis. Hassan Najadat, Kendall E. Nygard, Doug Schesvold North Dakota State University Fargo, ND 58105

DEA IN MUTUAL FUND EVALUATION

Word Count: Body Text = 5, ,000 (4 Figures, 4 Tables) = 7,500 words

Data Mining Methods: Applications for Institutional Research

D-optimal plans in observational studies

VALIDITY EXAMINATION OF EFQM S RESULTS BY DEA MODELS

Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects

Software project cost estimation using AI techniques

A Novel Feature Selection Method Based on an Integrated Data Envelopment Analysis and Entropy Mode

Data Mining Techniques Chapter 6: Decision Trees

Distributed Generation in Electricity Networks

Studying Auto Insurance Data

Software development productivity of Japanese enterprise applications

Weather forecast prediction: a Data Mining application

Application of Data Envelopment Analysis Approach to Improve Economical Productivity of Apple Fridges

6 Classification and Regression Trees, 7 Bagging, and Boosting

Chapter 12 Discovering New Knowledge Data Mining

Multiple Linear Regression in Data Mining

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Classification of Bad Accounts in Credit Card Industry

Data Mining - Evaluation of Classifiers

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Data Mining Practical Machine Learning Tools and Techniques

Better credit models benefit us all

Data Preprocessing. Week 2

Efficiency in the Canadian Life Insurance Industry: Some Preliminary Results Using DEA

Measuring the Relative Efficiency of European MBA Programs:A Comparative analysis of DEA, SBM, and FDH Model

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Getting Even More Out of Ensemble Selection

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

EFFECTS OF BENCHMARKING OF ELECTRICITY DISTRIBUTION COMPANIES IN NORDIC COUNTRIES COMPARISON BETWEEN DIFFERENT BENCHMARKING METHODS

Benchmarking Open-Source Tree Learners in R/RWeka

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

An Overview and Evaluation of Decision Tree Methodology

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Support Vector Machines Explained

MEASURING EFFICIENCY OF AUSTRALIAN SUPERANNUATION FUNDS USING DATA ENVELOPMENT ANALYSIS. Yen Bui

Bootstrapping Big Data

Measuring Information Technology s Indirect Impact on Firm Performance

Data Mining Classification: Decision Trees

Prediction of Stock Performance Using Analytical Techniques

ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS

How To Plan A Pressure Container Factory

DEA for Establishing Performance Evaluation Models: a Case Study of a Ford Car Dealer in Taiwan

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

Data Mining Techniques for Mortality at Advanced Age

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

A Decision Theoretic Approach to Targeted Advertising

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

A Comparison of Variable Selection Techniques for Credit Scoring

Intelligent profitable customers segmentation system based on business intelligence tools

COMPUTATIONS IN DEA. Abstract

A New Approach for Evaluation of Data Mining Techniques

SAS Software to Fit the Generalized Linear Model

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Modeling Lifetime Value in the Insurance Industry

TECHNICAL EFFICIENCY OF POLISH COMPANIES OPERATING IN THE COURIERS AND MESSENGERS SECTOR - THE APPLICATION OF DATA ENVELOPMENT ANALYSIS METHOD

A Guide to DEAP Version 2.1: A Data Envelopment Analysis (Computer) Program

Predicting required bandwidth for educational institutes using prediction techniques in data mining (Case Study: Qom Payame Noor University)

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE APPROACH

The effects of competition, crisis, and regulation on efficiency in insurance markets: evidence from the Thai non-life insurance industry

Numerical Algorithms for Predicting Sports Results

Selecting Data Mining Model for Web Advertising in Virtual Communities

Data Mining Applications in Fund Raising

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

Introducing diversity among the models of multi-label classification ensemble

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

PORTFOLIO ALLOCATION USING DATA ENVELOPMENT ANALYSIS (DEA)-An Empirical Study on Istanbul Stock Exchange Market (ISE)

Data quality in Accounting Information Systems

Data Mining Algorithms Part 1. Dejan Sarka

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

On the effect of data set size on bias and variance in classification learning

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Decision Tree Learning on Very Large Data Sets

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

Data Mining and Automatic Quality Assurance of Survey Data

Applying Data Mining Technique to Sales Forecast

Transcription:

Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software development process. These include programming languages, use of formal methodologies, CASE tools, etc. Moreover, efficiency in the context of software development has traditionally been measured as the ratio of functionality, either lines of code or functions points, and the effort expended. This is a unidimensional measure and ignores factors such as quality and elapsed time. This study utilizes Data Envelopment Analysis (DEA) to develop a multidimensional measure of efficiency in the context of software development and then examines how efficiency varies with other influencing factors such as the use of CASE tools, use of methodologies, team size, application type, etc. using a decision tree based model. Keywords: Efficiency, DEA, CASE Tools, Formal Methodology, Team Size 1. Introduction A number of different factors are thought to influence software development productivity. It is well known that various programming languages have evolved over the years and this progression through various generations of languages (2GL, 3GL, 4GL, etc.) have led to an increase in productivity of development. Studies have been carried out examining the lines of code required per function point for various languages. This, however, has been a unidimensional view of productivity. Other factors such as the use of modern development tools such as those utilized for Computer Aided Software Engineering (CASE), the use of a formal development methodology, development type (new development, re-development, etc.) the number of people on the development team, etc. also influence productivity. Premraj and others [17] analyzed around 600 projects from Finland over a period from 1978 to 2003. The analysis was based on the traditional definition of productivity. The authors acknowledged that it is easier to be productive if quality is disregarded. Inclusion of a quality metric would lead to a better measure of productivity. While there is a large sample size, all the projects analyzed are from Finland. Another study [3] examined the relationship between project complexity, team size and productivity. They defined productivity as function points per man month. Their analysis used a log-linear regression model. They found a strong negative relationship between team size and productivity. However, they state that a variable indicating quality is missing from the model. Another study [4] of 98 European firms found a negative correlation between team size and productivity. Analysis of 79 software development projects from the European Space Agency also found a negative relationship between team size and productivity [15]. In other studies, Case [6] and Kriebel [14] conclude that a high level of productivity requires quality to be sacrificed and vice versa. However, they could not examine in detail 171

some of the other factors and their possible interplay with the limited amount of data available to them. All of the studies mentioned above use the traditional function points per man-hour or equivalent definition of productivity. They also do not consider a number of other contributing factors such as the use of methodologies, use of CASE tools, the type of application being developed, etc. Moreover, the data sets are either limited in size or quite dated by today s standards. The analysis conducted in this paper considers that there are multiple outputs (functionality, product defects and elapsed time) as a result of the software development process which consumes development effort as an input. The objective of this analysis, therefore, is to measure efficiency using multiple measures of output and then to examine how efficiency varies with the factors such as team size, use of CASE tools, use of formal methodologies, type of project, etc. The approach used in this study will first involve the measurement of efficiency scores using a Data Envelopment Analysis (DEA) model. The DEA scores are subsequently analyzed using a decision tree model in order to examine the relationship between efficiency and other contributing factors mentioned above. 2. Data Envelopment Analysis Data Envelopment Analysis (DEA), developed by Charnes, Cooper and Rhodes (CCR) [8], is a non-parametric technique used for efficiency assessment. They refer to each unit that is analyzed as a Decision Making Unit (DMU). DEA uses linear programming to develop an efficient frontier that envelops the set of DMUs under consideration. Traditionally, efficiency has been measured as a ratio. Efficiency= Output Input When multiple inputs and outputs are considered, appropriate weights are assigned to each input and output. However, these weights may not be appropriate across the board. Therefore, DEA attempts to portray each DMU in the best possible light by using different weights for each input and output. The weights are determined by solving a series of linear programs one for each DMU. The formulation is as follows: Min Θ (2) s.t. λ i X i ΘX 0 (3) (1) λ i Y i Y 0 (4) λ i 0 (5) In the formulation above, known as the envelopment form, λ i represents the weight that is given to DMU i in its effort to dominate DMU 0, Θ is the efficiency, X are the inputs and Y i i are the outputs. An alternative formulation, known as the multiplier form is as follows: s Max u y (6) r ro r=1 s m s.t. u y r rj v x 0 (7) i ij r=1 i=1 172

m v x = 1 (8) i io i=1 u ε (9) r v r ε (10) In the formulation above, u r and v r represent the output and input weights respectively, x r and y r represent the inputs and the outputs. The two formulations shown above are dual linear programs. The CCR model discussed above assumes constant returns to scale (CRS). Banker, Charnes and Cooper [2] modified the CCR models to account for variable returns to scale. This model is known as the Banker-Charnes-Cooper (BCC) model. The CCR model (constant returns to scale) assumes that the DMU under consideration is operating at the optimal scale. The BCC model, on the other hand, allows DMUs to be classified as efficient even if they are not operating at the optimal scale size. The BCC model addresses variable returns to scale by enveloping the production possibility set with a convex hull as opposed to a conical hull in the CCR model resulting in a piecewise linear efficient frontier. The following convexity constraint is added to the envelopment formulation: N λ =1 (11) i i=1 The formulation is shown using equations 12-16. min Θ (12) s.t. λ X ΘX (13) i i 0 λ i Y i Y 0 (14) λ i = 1 (15) λ i 0 (16) The multiplier version of the formulation is shown below. max uy 0 u 0 (17) s.t. uy vx u 0 0 (18) vx 0 = 1 (19) u ε (20) v ε (21) The variable u is free in sign and is associated with the constraint 0 λ =1 in the i envelopment formulation. The sign of the variable u 0 also indicates the nature of returns to scale for the DMU being evaluated as follows: u 0 <0 indicates increasing returns to scale. u 0 >0 indicates decreasing returns to scale. u 0 =0 indicates constant returns to scale. 173

DEA scores are restricted to a scale of 0 to 1. This may lead to a situation where there are a number of DMUs which have a score of 1. In case ordinary least squares regression (OLS) models are used for post-dea analysis of scores, they may predict a value greater than one since one cannot restrict OLS models to a scale of 0 to 1. In order to overcome this problem of DEA scores being restricted to a scale of 0 to 1, a DEA model called the Super-efficiency model has been developed by Andersen and Petersen [1]. The superefficiency models yields DEA scores that are not restricted to being less than 1. This linear programming formulation for the superefficiency model is the same but it excludes data for the DMU under consideration from the reference set. Superefficiencies result from a DEA model where DMUs can obtain efficiencies greater than 1 because each DMU is not allowed to use itself as a peer. A number of books give a detailed exposition of DEA and its applications ([7], [10], [16], [18] and [19]). 3. Factors Influencing Efficiency in Software Projects 3.1 Sources of Data Empirical data on software development projects are obtained from a repository [11] maintained by the International Software Benchmarking Systems Group (ISBSG). The repository contains data on more than 4,000 projects with over 100 attributes on each project. However, a significant number of projects included in the database have missing attribute data. This limits the number of usable data in the repository. 3.2 Estimating Efficiencies: The DEA Model Since software development is still largely a labor driven process, the DEA model was developed using Work Effort measured in man hours as the sole input. Three parameters of product development performance [9] were chosen as outputs: product functionality measured as the number of function points developed, quality measured as the number of defects per function point and lead time measured as the elapsed time per function point from project start to finish. This is shown in figure 1. Figure 1: DEA Model: Inputs and Outputs Quality per function point and elapsed time per function point are undesirable outputs and need to be minimized while maximizing the number of function points delivered. The undesirable outputs are transformed using the additive inverse approach suggested by Koopmans [13] where an undesirable output Q is transformed using the additive inverse f(q) 174

= Q and included directly in the model as an output. The BCC Superefficiency model is used to determine DEA efficiency scores of the software development projects in the data set. 3.3 Analysis of DEA Results Superefficiency scores were calculated for 616 software development projects. Descriptive statistics of these scores are shown in table 1. Table 1: Descriptive Statistics of Superefficiency Scores N Min. Max. Mean Median Std. Dev. Superefficiency (Θ ) SUPER 616 0.00576 3.41176 0.17541 0.09035 0.28524 The DEA scores obtained using the Superefficiency model are analyzed further using tree based models. 3.3.1 Analysis Using a Decision Tree: Decision trees are a commonly used tool that can be used in prediction (numeric dependent variable) as well as classification (nominal dependent variable) applications when the independent variables are either nominal or numeric or a combination of both. Two algorithms are popularly used for generating decision trees where the response variable is numeric: Classification and Regression Trees (CART) and Chi-Square Automatic Interaction Detector (CHAID). The CART algorithm was originally developed by Breiman [5] works by recursively partitioning the data into nodes using binary splits. The CHAID algorithm was developed by Kass [12] and differs from CART by partitioning the data using multi-way splits. This leads to a wider tree but with a smaller depth which makes it easier to interpret. For problems with a numeric response variable, the CHAID algorithm computes F-tests to determine the difference in mean values between split groups and selects the split variable with the smallest p-value. If the smallest p-value is larger than some predefined α to split, no further splits are performed and the algorithm stops. In order to avoid overfitting, some of the nodes of the trees are removed through a process known as pruning. The decrease in predictive accuracy due to pruning of each node is compared to the cost having a more complex tree. A more detailed description can be found in [20]. Decision trees are a particularly attractive tool as they can be easily interpreted. The effect of each decision is easily seen at each split in the tree. A set of if-then type of rules can also be developed by traversing a tree from top to bottom. Using tree based models is similar to piecewise linear regression. A non-linear relationship can, therefore, be approximated more accurately using a tree based model instead of a linear regression model. The data are split into two approximately equal parts with observations being randomly assigned to each part. This resulted in a training set with 282 observations and a test set with 287 observations. The tree induction algorithm was applied to the training set to develop a model. The model was subsequently tested by applying it to the test set. A tree based model was developed using the CHAID algorithm keeping same dependent and independent variables as the statistical model. i.e. Superefficiency calculated using a DEA model was the dependent variable and the independent variables are as follows: Used Methodology A binary variable indicating whether a formal development methodology has been used or not. 175

CASE Tool Used A binary variable indicating whether a Computer Aided Software Engineering (CASE) tool has been used. AppType A categorical variable with multiple categories indicating the type of application that has been developed (a proxy variable for the complexity of the application being developed, see next section on Project Complexity) DevType A categorical variable indicating whether the project represents a new development, an enhancement or a re-development. LangType A categorical variable indicating the type of language used for development. Max. Team Size A metric variable indicating the maximum size of the development team. 3.4 Result and Interpretation The tree model developed is shown in figure 2. Each node in the tree shows the number of observations (n) falling in that particular group, the number of observations as a percentage of the total sample size and the mean of the predicted efficiency of all the observations falling in that group. Summary statistics of the model errors are shown in table 2. Table 2: Decision Tree Model Performance Training Data Test Data Minimum Error -0.637-0.790 Maximum Error 2.325 1.397 Mean Error 0.000-0.010 Mean Absolute Error 0.119 0.145 Standard Deviation of Error 0.240 0.255 From figure 2 it is seen that the first split in the tree is made using the Application Type variable with web development projects showing the highest efficiency (0.719) followed by MIS/Other (0.223) and DSS/TPS/WORKFLOW, etc. showing the lowest efficiency (0.140). Within the WEB category, no further splits are made perhaps due to the small number of observations (7) falling in that category. For both the other categories, further splits are made using the Team Size variable. For the DSS/TPS/WORKFLOW category (node 1), development projects are divided into two groups: team size of five or less and another group with a team size of more than 5 people. It is seen that projects with smaller teams have higher efficiency (0.217) than projects with larger teams (0.076). For projects where the team size is not mentioned (missing value), a further split is made using the Language Type variable. It is seen here that utilization of fourth generation languages yields a higher efficiency (0.189) than the use of third generation languages (0.126). The MIS/Other category (node 2) is split further using the Team Size variable. Three categories are created: team size of 3 or less, team size of more than 3 and a category where 176

the team size is not reported. Smaller teams of three people or less show a higher efficiency (0.446) than teams of more than 3 people (0.121). Projects with teams of 3 people or less (node 7) are further split based on their usage of CASE tools. Projects where CASE tools have been used show a lower efficiency (0.284) than those where CASE tools have not been used (0.849). There is no indication that the development type (new, enhancement, etc.) and the use of a formal methodology have any influence on development efficiency. Figure 2: Decision Tree Model 177

4. Conclusion It can be seen from the decision tree analysis that the primary factor that differentiates development projects in terms of efficiency is the type of application that is developed. It is conceivable that the Application Type variable is merely a proxy variable for the complexity of work that is to be done. Web development projects, due to the simplicity of the work to be carried out, are seen to have the highest efficiency. One can also conclude that the use of smaller development teams lead to higher efficiency. In the tree model in figure 2, each time the Team Size variable has been used as a splitting variable, the node with lower team size has the higher mean efficiency. One reason for this is that the effort required for coordination and communication between team members starts increasing in exponentially with team size leading to decreased efficiencies. One can also conclude the use of CASE tools actually leads to decreased efficiency especially when used by small teams. Considering that small teams are usually associated with smaller projects, and smaller projects have lower complexity, the use of CASE tools adds more complexity to a project that would otherwise have been simple to execute. The consequence of this complexity is lower efficiency. It is also seen that the use of fourth generation languages leads to higher efficiency. This is due to the higher level of abstraction that fourth generation languages offer above third generation languages. One does not have to bother with some of the nitty-gritty details of writing code when compared to some of the older languages. It is also seen that the use of formal methodologies and the type of development are not considered significant based on the tree model that has been developed. While the use of a methodology, which is normally associated with a more disciplined and systematic approach to software development could lead to fewer number of defects, it could also result in a longer development process. Since the DEA scores are a composite measure of efficiency, the benefits of better quality could be offset by the longer development times thereby indicating that the use of a methodology has no effect on efficiency. References [1] ANDERSEN, P., AND PETERSEN, N. A procedure for ranking efficient units in data envelopment analysis. Management Science 39 (1993), 1261 1264. [2] BANKER, R. D., CHARNES, A., AND COOPER, W. W. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science 30 (1984), 1078 1092. [3] BLACKBURN, J. D., LAPRÉ, M. A., AND WASSENHOVE, L. N. V. Brooks law revisited: Improving software productivity by managing complexity, May 2006. [4] BLACKBURN, J. D., SCUDDER, G. D., AND WASSENHOVE, L. N. V. Improving speed and productivity of software development: A global survey of software developers. IEEE Transactions on Software Engineering 22, 12 (1996), 875 885. [5] BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R., AND STONE, C. J. Classification and Regression Trees. Chapman & Hall, 1984. [6] CASE, A. F. Computer-aided software engineering. Database 17, 1 (1985), 35 43. [7] CHARNES, A., COOPER, W. W., LEWIN, A. Y., AND SEIFORD, L. M. Data Envelopment Analysis: Theory, Methodology and Application. Kluwer Academic Publishers, 1994. [8] CHARNES, A., COOPER, W. W., AND RHODES, E. Measuring the efficiency of decision making units. European Journal of Operational Research 2 (1978), 429 444. [9] CLARK, K. B., AND FUJIMOTO, T. Product Development Performance: Strategy, Organization and Management in the World Auto Industry. Harvard Business School Publishing, 1991. [10] COOPER, W. W., SEIFORD, L. M., AND TONE, K. Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software, 2nd ed. Springer, 2005. 178

[11] ISBSG. Data CD R10 Field Descriptions. International Software Benchmarking Standards Group, April 2007. http://www.isbsg.org. [12] KASS, G. An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 2 (1980), 119 127. [13] KOOPMANS, T. C. Analysis of production as an efficient combination of activities. In Activity Analysis of Production and Allocation, T. C. Koopmans, Ed. Cowles Commission, Wiley, New York, 1951. [14] KRIEBEL, C. H. Evaluating the quality of information systems. In Design and Implementation of Computer Based Information Systems, N. Szyperski and E. Grochla, Eds. Sitjhoff and Noordhoff, 1979, pp. 29 43. [15] MAXWELL, K., WASSENHOVE, L. N. V., AND DUTTA, S. Performance evaluation of general and company specific models in software development estimation. Management Science 45, 6 (1999), 787 803. [16] NORMAN, M., AND STOKER, B. Data Envelopment Analysis: The Assessment of Performance. John Wiley & Sons, 1991. [17] PREMRAJ, R., SHEPPERD, M., KITCHENHAM, B., AND FORSELIUS, P. An empirical analysis of software productivity over time, July 2005. [18] RAMANATHAN, R. An Introduction to Data Envelopment Analysis: A Tool for Performance Management. Sage Publications, 2003. [19] RAY, S. C. Data Envelopment Analysis: Theory and Techniques for Economics and Operations Research. Cambridge University Press, 2004. [20] WITTEN, I. H., AND FRANK, E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. The Morgan Kaufmann Series in Data Management Systems. Elsevier, 2005. 179

180