Developing a Translog Cost Function for Pharmaceutical Distribution

Developing a Translog Cost Function for Pharmaceutical Distribution Gaurav Jetly b Christian Rossetti a * Michael Kay b Donald Warsing a Robert Handfield a *contact author a Department of Business Management Poole College of Management North Carolina State University b Edward P. Fitts Department of Industrial Engineering College of Engineering North Carolina State University Abstract Transcendental logarithmic cost functions hold great importance in microeconomic theory because of its large number of applications. In this study we propose a novel methodology to generate a cost function from the secondary data using several industry specific constraints. Different factors accounting for the distribution costs are identified and categorized into five main factors. Due to absence of standard accounting data, several secondary sources are identified. Data for these representative factors are collected from Census data, ReferenceUSA database, COMPUSTAT database and Healthcare Distribution Management Association Factbook. Matlog functions are used for geocoding and transforming the data into representative input factors. The coefficients of the translog cost functions are estimated using nonlinear optimization technique. Key words: Translog cost function, secondary data, nonlinear optimization, pharmaceutical distribution

Introduction Cost functions hold great importance in the literature of microeconomics especially in the area of Transportation economics. It has been used in the past for a number of applications. Different types of cost functions were generated based on the research questions under consideration. A large number of studies have focused on Cobb Douglas and translog types of cost functions. As mentioned by Braeutigam (Meyer et al 999), cost functions can be used to address a number of issues: to determine the impact of factor price on the total cost, economies of scale, economies of scope, economies of densities and the effect of technology on the cost structure. In the past, cost functions were estimated to study economies of scale and scope. Kim (987) developed a translog cost function to examine the economies of scale and scope in US railroads. They developed and used a translog cost function for the freight and passenger services in United States. They estimated that even though there were mild economies of scale, diseconomies of scope in the joint production of passenger and freight services were evident. Christensen and Greene (976) used a translog cost function to estimate that the reduction in cost of the firms in electric power generation was due to technological advances rather than growth of the firms. They concluded that small numbers of extremely large firms were not essential for efficient production of electricity. Allen and Liu (995) in their estimation of cost function for LTL motor carriers included a variable for service quality as well. Their findings were based on service quality data obtained from the shippers. They found that because of their economies of scale large carriers were able to provide higher quality service at a competitive or lower price as compared to small carriers. Their findings contradicted existing literature which stated that economies of scale non-existent. However, previous research failed to include service levels; large carriers used their cost advantage to provide levels of customer service that were cost prohibitive for small carriers. Cost functions were also used to study economies of densities. Caves, Christensen, and Tretheway (984) derived a translog cost function for U.S. Trunk and local service airlines. They estimated that economies of density declined from.6 to.9 in the era of deregulation. This was achieved by increasing the number of flights and by denser seating arrangement. Gillin, Oum, and Tretheway (990) performed a similar study on the cost structure for the Canadian airlines. They estimated that there was constant return to firm size/network and returns would increase as network density increases. Their study suggested that small network airlines should not be at a disadvantage when compared to big firms, if they can maintain a good traffic density in their network. Fabbri, Fraquelli, Giandrone (000) studied the costs, technology and ownership of gas distribution in Italy. From their estimate of translog cost function, they suggested that there were constant economies of scale for gas distribution. They estimated

that there was a significant increase in total cost when density of customers decreases, suggesting high economies of density. Cost function methodology was also used in the past to study the impact of technological improvements on the cost structure of the systems. For this kind of analysis, data for a considerable time span was considered to determine the changes in cost structure over time. McCarthy and Urmanbetova (009) developed a translog cost function of U.S paper and paperboard industry which included a parameter to track the technological improvements. They estimated that technological improvements resulted in 0.0% annual reduction in the short run operating cost. The above mentioned studies are some of the examples of the applications of the cost function. It is clear that cost functions have diverse applications in different industries and therefore hold great importance in the literature of economics. Most of the above mentioned studies have used government or private databases for the estimation of the cost function. But there are certain industries or companies where such data is not collected or it is privately held and consequently not available to researchers. At present there is no alternate way, mentioned in the literature, to estimate the cost function in such a scenario. In this study, we have faced a similar situation where a single data source was not available to estimate the cost function for drug distribution centers(dcs). Therefore, we developed a novel technique which can be used to estimate such a cost function in the absence of a good data source. The Model and Data Sources The firms considered in this study were primarily involved in drug distribution. Three major players in the drug distribution business were considered. These three players shared more than 90 percent of the market and each one of them has a number of distribution centers in the United States. There are different costs associated with running such a distribution center: equipment, power, machinery, labor, safety, insurance, rent, office equipment / supplies, security, etc. These costs can be grouped into two major subcategories: ) employee related costs and ) cost associated with the size of the distribution center. Other costs involved are the cost of goods and the transportation cost, which depend on the population served as well as the population density of distribution network. To estimate the cost as a function of these categories, we considered the data for three pharmaceutical distribution giants in the United States. These were McKesson, AmeriSource Bergen, and Cardinal. The data was obtained from ReferenceUSA database and Compustat database. ReferenceUSA database provided the data for the geographical location, zipcodes, number of employees, and square footage of each distribution center. Compustat database provided the overall cost of these pharma-distribution

companies. The data for over DCs of Mckesson, 8 DCs of Cardinal and DCs of Amerisource Bergen was considered. To estimate the above cost function, we assumed that the cost of goods sold by a DC will be proportional to population above the age of 65 years in each location. This proportion was chosen due to the much higher use of pharmaceutical products per capita among the elderly. We also assumed that each of these companies will have exclusive distribution rights of the pharmaceutical products in their individual catalogue and therefore will have overlapping distribution regions. We first identified all the cities in the United States with population above 0,000 in 006. We collected the zipcodes, population, and land area of each of these cities. Since we assumed exclusive distribution rights, we considered DCs of one company at a time. For all the cities in a state, we determined the closest DC in that particular state and assigned the population of that city to the closest DC. Following this procedure we estimated the population served by DCs in different states for each company. We also estimated the total land area of these cities and determined the average population density of the cities served by each DC. We determined the population above the age of 65 years in each city and the corresponding land area using the Census data. This was achieved using the functions in the Matlog package developed by Kay (http://www.ise.ncsu.edu/kay/matlog/). The overall cost of running the DCs for each of these companies was obtained from the Compustat database.

Figure. Cities with population greater than 0,000 assigned to distribution centers of Mckesson The inputs for each DC cost function are composed of five parameters:: ) Number. of employees (E) ) DC facility area (A) ) Total population (greater than 65 years of age) served by that DC (P) 4) Population density of the region served by that DC (D) 5) Weighted average of distances between cities served and the distribution center with the population over 65 in these cities (Referred to as of Distribution [M] in the rest of the paper) Using the Matlog functions the values of above parameters were estimated for each company s (McKesson, ABC, Cardinal) set of individual distribution centers using the allocation method. Please see the example shown in Figure. Since the cost data for the individual DCs was not available, we estimated the Translog cost function with the above input parameters plus an error term. The long run cost function for running such a drug distribution center can be represented as: C= C( E, A, P, D, M) where C refers to the overall cost of running a distribution center and E refers to number of employees, A refers to the distribution center area, P refers to the population served, and M refers to the moment of distribution. To estimate the cost function we adopted a modified version of the translog functional form proposed by Christensen, Jorgenson, and Lau (97). The modified translog specification of the cost function is as follows: ln(c) I = α a ln(p) i + α b ln(d) i + α c ln(e) i + α d ln(a) i + α e ln(m) i + α f (ln(p) i ) + α g (ln(d) i ) + α h (ln(e) i ) + α j (ln(a) i ) + α k (ln(m) i ) + α ab ln(p) i ln(d) i + α ac ln(p) i ln(e) I + α ad ln(p) i ln(a) I + α ae ln(p) i ln(m) i + α bc ln(d) i ln(e) i + α bd ln(d) i ln(a) i + α be ln(d) i ln(m) i +α cd ln(e) i ln(a) I + α ce ln(e) i ln(m) I + α de ln(a) i ln(m) i + (Error) i () Input share equations: Share of COGs α a + α f ln(p) I + α ab ln(d) i + α ac ln(e) I + α ad ln(a) I + α ae ln(m) i () Share of = α c + α h ln(e) i + α ac ln(p) i + α bc ln(d) i + α cd ln(a) I + α ce ln(m) I () Share of DC space = α d + α j ln(a) i + α ad ln(p) i + α bd ln(d) i + α cd ln(e) i + α de ln(m) I (4)

Share of logistics cost = α b + α e + α g ln(d) i + α k ln(m) i + α ab ln(p) I + α ae ln(p) i + α bc ln(e) i + α bd ln(a) i + α be (ln(d) I + ln(m) i )+ α ce ln(e) i + α de ln(a) i (5) The coefficients for the terms associated with the above parameters were estimated using the following optimization problem subject to a number of constraints: Objective function: Minimize (Error) i where ln(c) I = α a ln(p) i + α b ln(d) i + α c ln(e) i + α d ln(a) i + α e ln(m) i + α f (ln(p) i ) + α g (ln(d) i ) + α h (ln(e) i ) + α j (ln(a) i ) + α k (ln(m) i ) + α ab ln(p) i ln(d) i + α ac ln(p) i ln(e) I + α ad ln(p) i ln(a) I + α ae ln(p) i ln(m) i + α bc ln(d) i ln(e) i + α bd ln(d) i ln(a) i + α be ln(d) i ln(m) i +α cd ln(e) i ln(a) I + α ce ln(e) i ln(m) I + α de ln(a) i ln(m) i + (Error) i (6) where i vary from to N, and N is equal to the number of distribution centers of each company. The above nonlinear optimization problem was solved with a number of constraints specific to the pharmaceutical distribution center. These constraints were obtained from the HDMA Factbook. The constraints from the HDMA Factbook were applied on individual share equations obtained by taking partial differential equations with respect to each parameter: Constraint : C i = Total cost from Compustat database (7) The pharmaceutical companies considered in this study are primarily involved in the drug distribution business. Therefore, the sum of the cost of running all the distribution centers of a firm should be equal to the overall cost mentioned in the Compustat database. Constraint : 0.95< (share of input i) <.05 (8) The first set of constraint was obtained by adding the individual factor share equation. The sum of individual factors must be equal to the overall cost. This constraint was applied to the equation corresponding to each distribution center. This constraint resulted in addition of constraints in Mckesson, 8 constraints in Cardinal and constraints in Amerisource Bergen model respectively with a lower bound as well as upper bound. Constraint : Mean Cost per =$60,000 (9) This constraint comes from the mean salaries of the employees is the distribution center. This was obtained as follows:

In this equation share is calculated using the trial coefficients and input share equation mentioned above. Total cost is also estimated at the run time using the trial coefficients. The number of employee value is obtained from the ReferenceUSA database(00). This constraint resulted in addition of another constraints in Mckesson, 8 constraints in Cardinal and constraints in Amerisource Bergen model respectively. Constraint 4: rd quartile of share of warehouse related cost< 0.0047 (0) The HDMA Factbook has statistics related to rd quartile of the warehouse related cost. The Factbook has mean, median and middle range of the Warehouse related cost as a percentage of net sales. In our model, we used upper bound of middle range as a constraint. The corresponding constraint equations were estimates as mentioned below: rd quartile of all the Facility Area share values < 0.47 Constraint 5: 0.95 <Share of COGs < () Drug distribution is a service industry. Therefore, cost of goods comprises a major share of the overall cost of the distribution center. As per the HDMA Factbook, the Cost of Goods sold accounted for more than 98 percent of the overall cost. In our model we kept a slightly relaxed constraint for cost of goods sold. In our model we allowed share of cost of goods to vary between 95 to 00 percent of the overall cost. This constraint resulted in the addition of another constraints in Mckesson, 8 constraints in Cardinal and constraints in Amerisource Bergen model respectively. Constraint 6: 0< Share of each input < () The last set of constraints on the input factors was that input share of each factor for each distribution center should be positive and less than the total cost of running the distribution center. This constraint resulted in the addition of another constraints in Mckesson, 8 constraints in Cardinal and constraints in Amerisource Bergen model respectively. Since there was a large number of data points forming equation 6, we assumed that the error value should be equal for similar data points. Therefore we have formed clusters of data points based on the five input parameter values for each distribution center. The DCs which fall in the same cluster were assumed to have equal error term. Hierarhical clustering was performed using Ward s minimum variance method to form the clusters. Results and Analysis

The above mentioned methodology was used to determine the cost function for three major pharmaceutical distribution companies. These three companies share more than 90 percent of the drug distribution market in the United States: Cardinal, Mckesson, and AmeriSource Bergen. The coefficient for the cost function derived for these three players are presented in Table below: Table. Model for three pharmaceutical distribution companies AmerisourceBergen Mckesson Cardinal Constant 5.7548 5.776 4.970 α d 0.0749 0.09690 0.0455 α c -0.0007-0.076-0.0079 α e 0.0 0.0999 0.005995 α a.006457.0004978.0044 α b 0.0068 0.0079 0.000979 α j -0.0069-0.00599-0.0044757 α h -0.00047 0.000746 0.0006407 α k -0.05-0.0044-0.005 α f -0.058 0.0006-0.0065 α g 0.005-0.000965 0.004564 α dc 0.0006 8.86E-05 0.0004988 α de 0.009-0.000947 0.0008044 α da -0.005-0.000486-0.0085 α db 0.0068-0.0007 0.0054 α ce 0.000 -.E-05-0.00079 α ca -0.0009-0.00070-0.00048 α cb 0.0005-0.000674 0.0005054 α ea 0.00497-0.00060 0.00086 α eb -0.005 0.000769-9.094E-05 α ab 0.00575-0.005-0.0004 The coefficients for the three different companies clearly are very different from each other. The reason for this could be the different distribution model used by the three companies. Of the three companies, AmerisourceBergen has fewer distribution centers serving the same population size, and Cardinal s network of distribution centers outnumbers both Amerisource Bergen and Mckesson. This should result in lower inventory associated costs for AmerisourceBergen, but higher transportation costs. Cardinal s network should perform better on transportation costs, but lower on inventory associated costs. Applications

The translog function developed above can be used to determine the effect of modification in the number or size of facilities on the overall cost of the company. Assuming that the market size remains constant, an increase/decrease in number of facilities can result in an increase or decrease in the cost. Furthermore, as new markets emerge with increased population above 65 years of age, our function can be used to determine the optimal number of facilities required to serve the markets with the lowest possible cost. Given the number, location and size of facilities one can determine the overall cost required to sustain the network. Therefore the Translog cost function can be used by the distribution companies to determine which acquisition will be valuable in the future in terms of operational cost. Some of the possible applications of this kind of cost function are: returns to scale, returns to densities, and network optimizations. Returns to scale Economies of scale in this type of cost function refers to an increase in the network size for an existing facility that is estimating percentage increase in cost by addition of another city in the network of a DC. This cannot be calculated mathematically and needs to be computed based on the city population and city distance from the DC. For example, consider the case of a distribution company serving North Carolina. The cost estimate based on the derived cost function is $ 6.46 million. This DC serves cities in North Carolina, Virginia, South Carolina, and West Virginia with population greater than 0,000. If the firm plans to include another city in the network like Elizabeth, NC which is not served by this DC at present, the new cost of operating the DC will be $6.9 million. Therefore the percentage increase in cost is.75 percent compared to a.59 percent increase in population. This implies that there are diminishing returns to scale if this city is added to the network of this DC. Similarly, if another city with higher population and location closer to the DC is included, it may result in increasing returns to scale. Returns to densities To examine economies of densities, we can estimate the percentage increase in cost with respect to percentage increase in the population served by the existing distribution center (without changing the size of the network.) This means that we will estimate the increase in cost if the DC serves greater population in the existing network. This can be achieved by diversifying the product portfolio to include the products for different age groups which would expand the market in the existing network. To estimate the percentage increase in cost with respect to percentage increase in population served, we use the partial derivative with respect to population as follows:

Share of COGs α a + α e ln(m) i + α f ln(p) i + α ab ln(d) i + α ac ln(e) I + α ad ln(a) I + α k (ln(m) i ) + α ae [ln(p) i + ln(m) i ] +[α be ln(d) i +α ce ln(e) i + α de ln(a) i ] The above factor cannot be estimated mathematically and needs to be computed based on the changes in the market characteristics of the distribution area. Particularly, as the population size in each city changes, the population over 65 and the moment of distribution changes. Therefore, for each case the cost factor must be estimated separately to determine the effects of population on logistics costs. As an example, if the firm includes a new product in their portfolio which targets middle aged population, and there is a proportionate increase of 0 percent of the market in all the cities served by the DC located in North Carolina, it will result in a 0.4 percent increase in the cost. This means that there are constant returns to densities. One of the possible reasons for this could be the truck load shipments in between the nodes. A new product line can result in increasing returns to densities if the DC targets the cities which are located closer to the DC. Network optimization This kind of facility level cost function can be used to optimize the network for a distribution company. If a company is planning to expand in the future and is considering a number of possibilities (i.e. entering a new market, acquisition, increasing the number of facilities in the existing market for better responsiveness) this kind of facility level cost function can offer great insights during network design. An analysis using such a cost function to determine the optimal number of facilities will result in inclusion of both economies of scale as well as economies of densities. Summary and Conclusion The findings of this study suggest that in the absence of standard accounting data, alternative methods can be applied in certain cases to derive a translog type cost function. As can be seen, the methodology for using such a cost function is different from the usual way of using the cost function. For the cost function derived for the pharmaceutical distribution centers, it can be inferred that there are diminishing returns to scale ad constant returns to densities. Furthermore, such a cost function can be used to derive the cost of the entire network of DCs and therefore can be used for DC network design. Even though a reduction in cost occurs with the reduction in number of distribution centers, the logistics cost may increase drastically, especially if the most distant cities have higher population.

References Allen WB and Liu D. 995. Service quality and motor carrier costs: An empirical analysis. Review of Economics and Statistics77(), 499-50.Caves DW, Christensen LR, Tretheway MW. 984. Economies of density versus economies of scale: Why trunk and local service airline costs differ. Rand Journal of Economics 5(4), 47-89. Christensen LR, Greene WH. 976. Economies of scale in U.S. electric power generation. Journal of Political Economy 84(4), 655-76. Christensen LR, Jorgenson DW, and Lau LJ. 97. Transcendental logarithmic production frontiers. Review of Economics and Statistics. 55, 8-45. Fabbri P, Fraquelli G, Giandrone R. 000. Costs, technology and ownership of gas distribution in Italy. Management and Decision Economics (), 7-8. Gillen, D. W., Oum, T.H., and Tretheway M. W. 990. Airline cost structure and Policy implications: a multi-product approach for Canadian Airlines. Journal of Transport Economics and Policy 4 (), 9-4. Kim, H. Y. 987. Economies of scale and scope in multiproduct firms: Evidence from U.S. railroads. Applied Economics 9(6), 7. McCarthy, P. and Urmanbetova, A. 009. Production and cost in the U.S. paper and paperboard industry. Applied Economics 4(7) -. Meyer J.R., Gâomez-Ibâaänez J.A., Tye W.B., and Winston C., 999. Essays in transportation economics and policy, Brookings Institution Press, 57-97.

Appendix A Individual Regression Results ABC Response F Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.999999 0.00079.858 Analysis of Variance Source Model Error C. Total DF 8 0 Sum of Squares 8.707549 0.000004 8.707556 Tested against reduced model: Y=mean Parameter Estimates Mean Square 0.7569 5.e-7 F Ratio 64060 Prob > F Term * * * * * * * * Estimate -0.49468 0.08508 0.6849 0.955-0.009 0.04905-0.067-0.009-0.005967 0.0085-0.006 0.09507 0.00789 Std Error 0.04648 0.00579 0.0564 0.0705 0.004 0.00595 0.0047 0.0045 0.00644 0.0069 0.00506 0.00568 0.0054 t Ratio -7.8. 0.77 55.4-4. 7.6-6.00 -.66 -.6.6 -.80 5.47 5.6 Prob> t 0.00* 0.00* 0.0067* 0.0* 0.090 0.0006* 0.0008* Sorted Parameter Estimates Term * * * * * * * * Estimate 0.955-0.49468 0.04905-0.067-0.009 0.6849 0.09507 0.00789-0.009-0.005967 0.08508 0.0085-0.006 Std Error 0.0705 0.04648 0.00595 0.0047 0.0045 0.0564 0.00568 0.0054 0.004 0.00644 0.00579 0.0069 0.00506 t Ratio 55.4-7.8 7.6-6.00 -.66 0.77 5.47 5.6-4. -.6..6 -.80 Prob> t 0.0006* 0.0008* 0.00* 0.0067* 0.00* 0.0* 0.090 Prediction Profiler.5 F.4495 ±0.00085.5.5 0.5...4.5.6.7 4 5 6 4 5 6 7 8 0 4 0. 0.6.4.5098.907 6.6786.0986 0.8787

Mckesson Response F Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts).57e-5.7845 Analysis of Variance Source Model Error C. Total DF 6 4 0 Sum of Squares.6848.87407e-9.6848 Tested against reduced model: Y=mean Parameter Estimates Mean Square.446.4e-0 F Ratio.06e+0 Prob > F Term * * * * * * * * * * * * Estimate 0.05549-0.06748 0.08768.000095 0.007578-0.0005 0.00078-0.00484 0.000-0.00047 0.00557-0.0007-0.00089-0.000-0.000469 0.00066-0.0079 Std Error 0.00 0.0089 0.000 0.0009 0.00068 0.00057.86e-6.458e-5 5.9e-5 0.000048 0.00050 0.000 6.e-6 0.00005 7.e-5 5.955e-5 8.95e-5 t Ratio.96-9.6 74.94 56. 6.56-5.5 540.4-60.5.0-9.8.50 -.47-6.8-8.74-6.5.0 -.7 Prob> t 0.000* 0.07* 0.054* 0.070* Sorted Parameter Estimates Term * * * * * * * * * * * * Estimate.000095 0.00078 0.08768-0.00089-0.00484-0.0079 0.05549 0.00066-0.00047-0.06748-0.000 0.007578-0.000469-0.0005 0.00557-0.0007 0.000 Std Error 0.0009.86e-6 0.000 6.e-6.458e-5 8.95e-5 0.00 5.955e-5 0.000048 0.0089 0.00005 0.00068 7.e-5 0.00057 0.00050 0.000 5.9e-5 t Ratio 56. 540.4 74.94-6.8-60.5 -.7.96.0-9.8-9.6-8.74 6.56-6.5-5.5.50 -.47.0 Prob> t 0.000* 0.054* 0.070* 0.07* Prediction Profiler 4 F.794 ±5.88e-5 0...4.5.6.7 4 5 6 7 4 5 6 7 8 0 0 0.5.5.649 4.807 5.86.5088 0.844

Cardinal Response F Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance.. 0.580.565 8 Source Model Error C. Total DF 5 8 Sum of Squares 4.557 0.570 4.69060 Tested against reduced model: Y=0 Parameter Estimates Mean Square 6.0 0.0 F Ratio 694.6 Prob > F Term * * * * * * * * * * Estimate -.7057 -.890704.5095568.846 7.4577.749900 0.0580786 0.90675-0.46564 -.4804 -.6465 0.95076 0.764479-0.4845 0.5844 Sorted Parameter Estimates Std Error.8997 0.8854.86049 0.674.004 0.57594 0.0006 0.077 0.49 0.056 0.9896 0.05855 0.844 0.908 0.09 t Ratio -0.9-7.44.7 5.0.9.06.89.68 -.9-4. -.0 6.6.94 -.. Prob> t 0.7 0.58 0.054* 0.0056* 0.008* 0.04* 0.088* 0.000* 0.00* 0.0007* 0.0049* 0.007* Term * * * * * * * * * * Prediction Profiler Estimate -.890704 0.95076.846 -.4804 0.764479 -.6465 0.5844-0.4845.749900 0.0580786 0.90675 7.4577-0.46564.5095568 -.7057 Std Error 0.8854 0.05855 0.674 0.056 0.844 0.9896 0.09 0.908 0.57594 0.0006 0.077.004 0.49.86049.8997 t Ratio -7.44 6.6 5.0-4..94 -.0. -..06.89.68.9 -.9.7-0.9 Prob> t 0.000* 0.0007* 0.00* 0.007* 0.0049* 0.0056* 0.008* 0.04* 0.054* 0.088* 0.58 0.7 5 F.058064 ±0.464 0...4.5.6.7 4 5.5 6 7 4.5 5.5 6.5 7.5 0 0 0.5.5.4786 4.006 5.68.486 0.894