Quantitative Methods in Regulation (DEA) Data envelopment analysis is one of the methods commonly used in assessing efficiency for regulatory purposes, as an alternative to regression. The theoretical development of DEA is usually attributed to an economist, M.J. Farrell (1957) but became operational much later following work by OR specialists Charnes, Cooper and Rhodes (1978)(CCR). Consequently, the DEA technique is more associated with the operations research and management science literature, although applications in the economics literature are becoming fairly common. Two orientations are possible corresponding to the cost and output approaches respectively. Figure 1 shows the standard regression-based approach. Figure 2 is a version of the original Farrell diagram for the cost (strictly, input) orientation. There are five companies (A to E), each producing a unit of single output (y) using two inputs (x1 and x2). Companies C, D and E are technically efficient. For example, C uses more of x1 and less of x2 compared with D, while company B is inefficient compared to D since it uses more of both x1 and x2. The efficient counterpart of B, i.e. D, is called B s reference group. Figure 1: Least-squares regression Cost (C) A ^ C = o + 1X 1 OLS line x A Corrected OLS Line most efficient observation Size (X) The term data envelopment analysis arises because DEA can be thought of as fitting a frontier which envelopes the data. In Figure 2 the frontier is defined by CDE. Points C, D and E of the frontier represent real companies while points on the line segments linking the real companies represent hypothetical ones. The technical efficiency of a company such as A is measured by comparing it with its corresponding hypothetical benchmark which is A. The distance AA is a measure of the efficiency of company A. The general version of DEA allows for many inputs in order to calculate technical efficiency. In the management science literature DEA is typically represented as a generalised ratio: City University 1
weighted sum of outputs efficiency ratio = = weighted sum of inputs i j qy i px j i j (3) where ys and xs are outputs and inputs respectively while q s and p s are firm specific weights to be calculated by the DEA technique. (These are counterparts of the regression coefficients.) Each company is given a single efficiency measure from zero to one. The primal LP problem as defined by CCR is essentially to choose these q s and p s to maximise the efficiency score, subject to the constraint that, with these weights no company gets score higher than 1.00. The closer the score to one the more efficient the firm. DEA allocates specific weights (q s and p s) for each company, on the basis of giving the highest possible score. This is sometimes expressed as putting a company in the best possible light. Here we consider the special case where inputs are aggregated by their prices with the aim of deriving overall cost efficiency for (potentially) multiple outputs. Furthermore, the following description is the dual of the approach described by Charnes and Cooper and much of the management science literature since it corresponds more closely to the economic interpretation of the frontier. As with COLS, DEA assumes that there is at least one efficient observation. As in the Farrell diagram, the cost frontier is a convex hull formed by joining adjacent efficient points together by hyperplanes. (Where there is only one x one y and no zs these can be represented by straight lines as in the diagram above.) A separate analysis is carried out for each observation. In the special case here where the inputs are aggregated into a single cost measure we can express the production correspondence as an explicit function c(y,z). c = f(y 0,z 0) A technically efficient company in an environment z 0 would produce the given output y 0 at minimum cost (c min ) while an inefficient company would under the same conditions, incur cost more than the maximum. (c>c min ). A measure of the company s technical efficiency could be therefore the ratio (c min /c). The closer this ratio to one the higher the company s efficiency. In the dual of the CCR approach, for each observation the algorithm searches for an efficient set E 0 which minimises the efficiency score K. We use the efficient set (or reference group) to construct an artificial observation which is a linear combination of the efficient set. Thus the cost of the artificial observation, c E is formed from c E = Σ λ I c i iεe The efficient set must produce at least as much of every output as observation 0: y E j = Σ λ I y ij y 0j for each output j iεe City University 2
Where there are additional noncontrollable factors z, these must also meet the weak inequality: z E j = Σ λ I z ij z 0j for each noncontrollable j iεe The efficiency score is then K = c e /c 0. (There are no noncontrollables in LAB7DEA Model1) Additional constraints are that the weight on any observation, λ i is non-negative. In the original, constant returns to scale formulation, the λ i s are otherwise unconstrained. In the variable returns to scale approach, due to Banker, Charnes, and Cooper (1984), the sum of the λ i s is constrained to 1, which ensures that the artificial observation is not just a linear, but a convex combination of the efficient set. The shadow prices on each constraint represent the weights (ps and qs) on the normal, primal analysis. Roughly these correspond to the regression coefficients, except that each observation has its own weights depending on which facet of the convex hull the reference group defines. DEA also allows the inclusion not only of inputs and outputs but also of other variables describing a company s operating environment (often called non-controllable or environmental variables), thus enabling like for like comparisons. In DEA one has to decide about the relative importance of competing explanatory factors prior to the analysis. The inputs and the outputs are entered into the DEA optimisation algorithm but there is no built-in test about their appropriateness. With DEA one has to decide also about the sign of these explanatory factors prior to the running of the DEA programme while with RA the signs of the explanatory variables are calculated by the OLS algorithm. Without any means for determining the appropriate specification DEA should not be used as the primary approach in comparative efficiency especially when RA is possible. DEA operates in one of two modes - input shrinkage (or minimisation) and output expansion (or maximisation.) Figure 2 shows how, in the input minimisation mode, DEA can use data on several inputs to produce a performance score that is independent of any imposed weighting system for the inputs. The unit under consideration, D, is producing the same City University 3
amount (or less) of every output as units A, B, and C. Figure 2: Input Efficiency: A Comparison of units Producing the Same Output Input 2 A, B, and C are all technically efficient G F A B E C D Input efficiency for D = OE/OD 0 Input 2 B and C are the `reference group' for unit D The fundamental assumption of DEA is that, if B and C are feasible then a linear combination such as E is also feasible. E represents a radial contraction of D, using proportionately less of every input. The ratio OE/OD is the Farrell measure of technical efficiency. B and C are said to be the reference group for unit D. Figure 3 shows the equivalent measure in the output expansion mode. Units I, J, and K are using the same (or less) of every input as branch G. I, J and K are regarded as technically efficient relative to the other points in the data set. City University 4
Figure 3: Output Efficiency: A Comparison Units Using the Same Inputs Output 2 I, J, and K are technically efficient I H J Efficiency score of G = OG/OH G K 0 Output 1 I and J are the reference group for G Constant and Variable Returns in DEA The difference between the constant and variable returns to scale cases is illustrated in Figure 4 using a single-input, single output example. On this figure we have a single output on the vertical axis and a single input on the horizontal axis. Points A, B, C, D, E and F represent actual companies with varying output-input ratios. Company C has the maximum outputinput ratio. Under CRS (constant returns) the fitted DEA frontier will be the ray OC. However under VRS (variable returns) the fitted DEA frontier will be the envelope line ABCDE. Figure 4: Constant and varying returns to scale in DEA Output D E F C B O A Input The companies below and to the left of company C are subject to increasing returns while the ones located above and the right are subject to decreasing returns. Therefore, if a CRS frontier like OC is fitted the companies on ABCDE which are not on ray OC will be City University 5
classified as inefficient, partly due to scale inefficiency, so that more companies are likely to be labelled as efficient under VRS compared to the CRS case. 3. RA versus DEA RA and DEA are widely regarded as equivalent alternative techniques for estimating or fitting an efficiency frontier. Both are the results of a minimisation process: RA uses the least squares algorithm to fit an average line while DEA uses linear programming to fit a convex hull. However, the two techniques have fundamental differences: 1) RA calculates a fixed number of parameters defined by the number of regressors k. The number of parameters calculated by DEA depends on the data set, and the number of factors used as reference sets. The upper limit is N k, where N is the number of observations and k the number of variables. By way of comparison a third approach, (known as parametric programming) combines the LP approach to minimisation with a fixed number of parameters, k.. 2) RA makes assumptions about the stochastic properties of the observed data. Under RA the observed data points are assumed to be realisations of random variables following certain distributions, usually normal, enabling the carrying out of hypothesis testing. This is an important advantage of RA over DEA as it is normally practised since it enables to check the statistical significance of competing explanatory variables as well as the appropriateness of the estimated functional form. Since DEA has been developed mainly in a non-statistical framework, hypothesis testing is more problematic with DEA. Without hypothesis testing model selection is problematic. Consider for example the case of fitting a cost frontier. Economic theory predicts that the quantity of output and factor prices should be among the exogenous determinants of costs. While this is true, there are also other important factors influencing costs in the real world. RA provides an empirical test or a decision rule for identifying the important ones. The implied production possibility frontier for the two approaches is clearly different. In particular, whilst there is only one efficient observation in the econometric approach, DEA will tend to find many observations on the frontier. In order to carry out either of these methods an essential first step is to find out which factors affect the raw performance as reflected in the indicators. The econometric approach, with its statistical tests, is most useful in this process, and we used the results of our econometric analysis to inform our specification of the DEA model. In the econometric approach the benchmark cost level is derived from a statistical cost function which provides the best fit to the data. Implementing this approach requires an assumption to be made about the shape of the underlying cost function. On the other hand, (DEA) does not require an assumption about the shape of the cost function, and, in some ways provides a more convenient framework where there are many outputs and inputs. Finally, all attempts at calculating relative efficiency may be frustrated by difficulties in obtaining sufficient data of good enough quality. Remember GIGO. City University 6