Economic Models and Algorithms for Grid Systems

Transcription

1 Economic Models and Algorithms for Grid Systems KUMULATIVE HABILITATIONSSCHRIFT an der Fakultät für Wirtschaftswissenschaften der Universität Karlsruhe (TH) von Dr. Dirk Georg Neumann, M.A., Univ. Wisc. 12 October 2007, Karlsruhe

2 Table of Contents Table of Contents i PART A: OVERVIEW ARTICLE TOWARDS AN OPEN GRID MARKET 1 1 Introduction 2 2 Definitions Cluster Distributed Computing Grid Computing Utility computing 5 3 Markets as the Business Model for Grid Pricing strategies The need for Self-organization 9 4 Related Work Market-based Grid systems Markets are dead, long live markets 13 5 An Architecture for Open Grid Markets Architecture Rationale Open Grid Market Architecture Layer 1 Core Market Services Layer 2: Open Grid Market Layer 3: Intelligent Tools Layer 4: Grid Application 18 6 Organization of the book Market Mechanisms for Grid Systems Bridging the Adoption Gap Developing a Roadmap for Trading in Grids A Truthful Heuristic for Efficient Scheduling in Network-centric Grid OS Economically Enhanced MOSIX for Market-based Scheduling in Grid OS GreedEx A Scalable Clearing Mechanism for Utility Computing Market-Based Pricing in Grids: On Strategic Manipulation and Computational Cost Trading grid services a multi-attribute combinatorial approach 21 i

3 6.1.7 A Discriminatory Pay-as-Bid Mechanism for Efficient Scheduling in the Sun N1 Grid Engine Decentralized Online Resource Allocation for Dynamic Web Service Applications Economic Component Design Self-Organizing ICT Resource Management Policy-based Automated Bidding A Decentralized Online Grid Market Best Myopic vs. Rational Response Economically Enhanced Resource Management for Internet Service Utilities _ Situated Decision Support for Managing Service Level Agreement Negotiations Using k-pricing for Penalty Calculation in Grid Market Rightsizing of Incentives for collaborative e-science Grid Applications with TES Design Issues and Extensions Distributed Ascending Proxy Auction A Cryptographic Approach Technology Assessment and Comparison: The Case of Auction and E-negotiation Systems Comparing Ingress and Egress Detection to Secure Inter-domain Routing: An Experimental Analysis A Market Mechanism for Energy Allocation in Micro-CHP Grids 26 7 Conclusion and Outlook on Future Research Concluding remarks Outlook 28 Bibliography 30 Part B: Articles Article: Bridging the Adoption Gap Developing a Roadmap for Trading in Grids 34 Article: A Truthful Heuristic for Efficient Scheduling in Network-centric Grid OS _ 54 Article: Economically Enhanced MOSIX for Market-based Scheduling in Grid OS _ 66 Article: GreedEx A Scalable Clearing Mechanism for Utility Computing 74 Article: Market-Based Pricing in Grids: On Strategic Manipulation and Computational Cost 95 Article: Trading grid services a multi-attribute combinatorial approach 125 Article: A Discriminatory Pay-as-Bid Mechanism for Efficient Scheduling in the Sun N1 Grid Engine 154 Article: Decentralized Online Resource Allocation for Dynamic Web Service Applications 164 ii

4 Article: Self-Organizing ICT Resource Management Policy-based Automated Bidding 172 Article: A Decentralized Online Grid Market Best Myopic vs. Rational Response 180 Article: Economically Enhanced Resource Management for Internet Service Utilities 91 Article: Situated Decision Support for Managing Service Level Agreement Negotiations 205 Article: Using k-pricing for Penalty Calculation in Grid Market 214 Article: Rightsizing of Incentives for collaborative e-science Grid Applications with TES 224 Article: Distributed Ascending Proxy Auction A Cryptographic Approach 235 Article: Technology Assessment and Comparison: The Case of Auction and E- negotiation Systems 244 Article: Comparing Ingress and Egress Detection to Secure Inter-domain Routing: An Experimental Analysis 284 Article: A Market Mechanism for Energy Allocation in Micro-CHP Grids 307 Lebenslauf 317 Schriftenverzeichnis 327 iii

5 Part A Overview Article Towards an Open Grid Market 1

6 TOWARDS AN OPEN GRID MARKET Dirk Neumann Institute of Information Systems and Management (IISM) Universität Karlsruhe (TH) 1 Introduction During the last years, the costs for ICT infrastructures have enormously exploded as a result of one-application-one-platform style of deployment. This has left most of the ICT infrastructures with extremely low system utilization and wasted resources (Carr 2005). Examples can be drawn from virtually all sources: One recent study of six corporate data centers reported that the bulk of their 1000 servers just utilized 10% to 35% of their available processing power (Andrzejak, Arlitt et al. 2002). IBM estimated the average capacity utilization rates of desktop computers to just 5% (Berstis 2002). The Gartner Group indicates that between 50% and 60% of a typical company s data storage capacity is wasted (Gomolski 2003). Overcapacity can not only be observed with respect to hardware, but also to software applications. In essence, highly-scalable applications can serve additional users at almost no incremental costs hence redundant installations of the same application create unnecessary costs (Carr 2003; Carr 2005). In recent times, ICT is undergoing an inevitable shift from being an asset that companies posses (e.g. computers, software) to being a service that companies purchase from designated utility providers. This shift will take years to fully unfold, but the technical building blocks have already begun to take shape. On the coat tail of this shift, the business model of utility computing or equivalently e-business on-demand is more and more emerging. Utility computing denotes the service provisioning model, in which a service provider makes computing resources and infrastructure management available to the customer as needed, and charges them based on usage. This new business model relying on service-oriented architectures contributes to driving down costs resulting from low system utilization and waste and complexity by plug & play service provisioning in the ICT infrastructure. Typically, two general pricing models for utility computing are identified. The first pricing model is the subscription model. Accordingly, users are charged on a periodic basis to subscribe to a service. The second pricing model is the metered model, where users are charged on the basis of actual usage. While the first model gives rise to a waste of resources, since charging is independent of usage, the second, the metered model, is ultimately deemed the promising to remove the waste of resources (Rappa 2004). From the user s point of view, the metering of usage can be compelling, as one pays for what one uses. For the service provider the metered model is appealing, as it offers an opportunity to sell idle or unused computer capacity. The metered model, however, is not without problems per-se: It has to be determined, which price is charged from a customer, when the meter is turned on (Rappa 2004). The price is essential, as it determines the incentive for resource owners to provide them as well as for consumers to demand them. If the price is too high, not all the offered resources are demanded contributing to idle resources. If the converse is true, not all tasks can be realized as the resource provision is too low, which is equally bad. 2

7 In general, market-based approaches are considered to work well for price determination. By assigning a value (also called utility) to their service requests, users can reveal their relative urgency or costs to the service (Buyya, Abramson et al. 2004; Irwin, Grit et al. 2004). Despite their theoretically excellent properties only few market-based approaches have become operational systems, let alone commercial. The reason for this lack of market-based operational systems stems from the fact that many questions arise particularly concerning the applicability of markets, and their relevance to system design. This book strives for bridging this gap by providing economic models and algorithms for Grid systems in general, and for utility computing in particular. It is argued that those economic models and algorithms supporting technical Grid infrastructures may result in a widespread usage of utility computing. The results reported in this book have been generated within the EU funded project Self-organizing ICT resource management (SORMA) 1. While the project is mainly focused on the technical implementation aspects, this book is devoted to the underlying economic principles. This introductory chapter primarily aims at providing a common background for understanding the selected papers of this book. Moreover, the paper promotes the idea of an open Grid market, which defines a blueprint for how to construct Grid markets. This blueprint follows a component-based architecture with well-defined interfaces between the different components. Most of the components embody economic models and algorithms. This paper introduces the architecture of an Open Grid Market and describes how the following papers fit into this architecture. The remainder of this chapter is structured as follows. Firstly, important definitions are introduced that are commonly used throughout the book (2.1). Secondly, it is motivated why markets are deemed promising for Grid computing (2.2). Thirdly, related approaches are described; it is analyzed why those approaches never made it into practice (2.3). Fourthly, the architecture of an Open Grid Market is proposed, which improves over related approaches (2.4). Lastly, the structure of the book is illustrated by referring to the overall Open Grid Market Architecture (2.5). 2 Definitions The terms cluster, distributed computing, Grid computing and utility computing are often used as synonyms. Although these terms denote related terms, they impose different requirements upon both the technical infrastructure as well as on the underlying economic models. In the following we will define those four concepts, point out their differences and give examples. 2.1 Cluster A cluster denotes a group of tightly coupled resources that work together on a common problem. The resources(e.g. computers consisting of memory and computation power) are homogeneous, which are typically connected via local area networks (Buyya 1999). As all parts of a cluster are working on a common problem they can be viewed as a single entity. Clusters can be distinguished into three types: Failover clusters Failover clusters aim at improving the availability of services that the cluster performs. This implies that they maintain redundant nodes, which can be used once components of the cluster fail to provide the service (Marcus and Stern 2000). Failover clusters are frequently used for important databases, business applications, and e-commerce websites. 1 ( 3

8 Examples of failover cluster solutions are Linux-HA (High-Availability Linux), Sun- Cluster, or Microsoft Cluster Server. Load-balancing clusters Load-balancing clusters distribute the workload to a set of computers or servers to achieve a balanced workload over all nodes of the cluster (Kopparapu 2002). Albeit those clusters are specialized in load balancing, in many cases they embed failover properties as well; those clusters are dubbed server farm. Professional load balancers are the Sun Grid Engine N1, Maui and MOSIX (Barak, Shiloh et al. 2005) and are typically used as back-ends for high traffic websites such as Wikipedia, Online-shops or marketplaces. High-performance computing clusters High-performance computing clusters are those clusters that aim at increasing the performance by exploiting parallelism achieved through splitting a computationally demanding task into several subtasks. Examples for state-of-the-art high-performance computing solutions are so-called Beowulf clusters (Bell and Gray 2002). Almost all supercomputers are clusters. For instance the largest super computer of the world, the BlueGene/L at Lawrence Livermore National Laboratory, California is devoted to high performance computing. It consists of processors attaining performance of more than 367 TFLOPS 2 at its peak. The investment costs for BlueGene/L were quite enormous exceeding $ 290 million. 2.2 Distributed Computing Another computing model is referred to as distributed computing. In essence, it denotes a method of computer processing, where parts of the program are parallelized on different machines. Different to clusters, the involved resources are typically heterogeneous. In addition the resources are only loosely coupled for example over the internet. Distributed computing is characterized by the fact that there exists a central management component, which divides the program into smaller pieces and distributes those to the connected machines. The central management component thereby considers the heterogeneous resources when processing the distribution job. The most prominent example of distributed computing is SETI@home developed by the University of California at Berkeley. It is essentially a model of voluntary computing. End users can donate their idle computation time of their home PC to the search for extraterrestrial life. Subsequently, the results of the many PCs are automatically reported back and reassembled into a larger solution (Werthimer, Cobb et al. 2001). SETI currently combines more than 1,594,396 active users from more than 208 different countries. The impact SETI has is quite enormous totaling average performance of 261 TFLOPS and up to 500 TFLOPS in its peak exceeding the performance of the largest supercomputer in the world. SETI has become the pioneer in voluntary computing the BOINC framework is an open source software platform that is used for other distributed computing projects such as Folding@home and Cancer Research Project The advantages of distributed computing are impressive, as processor performances of supercomputers can be reached free of charge, just by utilizing otherwise idle resources. The potential of voluntary computing is huge, taking into consideration that home PCs are only utilized by 5 % of the time. Having in mind that investment costs accounts only for 20 % of the total cost of ownership leaving 80 % for administration, the computation achieved by the 2 TFLOPS stands for Terra FLOPS, where FLOPS (floating point operations per second) denote a performance metric for processors. 4

9 would incur total cost of ownership of more than $ 1,450 million, which pertains to the BlueGene/L supercomputer. 2.3 Grid Computing The term Grid computing was initially introduced by Foster and Kesselman in Accordingly a grid denotes a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities (Foster and Kesselman 1998). Later on Foster refined this definition as direct access to computers, software, data, and other resources, as is required by a range of collaborative problemsolving and resource-brokering strategies emerging in industry, science, and engineering (Foster and Kesselman 1998). Obviously, the concept of Grid computing is very related to distributed computing. Both computing models involve heterogeneous resources that are loosely coupled. Typically organizational boundaries are crossed. Different to distributed computing, Grid computing relies on distributed management. A key advantage of Grid systems is their ability to pool together resources, including both computational resources as well as hard- and software, that can then be shared among Grid users (Alkadi and Alkadi 2006; Knight 2006). This sharing has major ramifications for the efficiency of resources, as it allows increased throughput because of statistical multiplexing and the fact that users have a different utilization pattern. lowers delay because applications can store data close to potential users offers greater reliability because of redundancy in hosts and network connections All of this results in lower costs than comparable private system (Lai, Huberman et al. 2004). An example for the potentials of Grid computing offering is Novartis. For their research and development activities in creating new drugs, they connected their PCs within all branches of the company. Over night up to 2700 computers were used as number crunching machines. According to their press releases, the Grid helped reducing projects scheduled on a single supercomputer for six years to a total run time of 12 hours. Including the productivity gains Novartis claimed Grid computing to be responsible for savings up to $ 200 million over three years. 2.4 Utility computing Utility computing 3 differs from the previous three concepts, as it does not denote an additional computing model, but a business model on top of Grid computing. More precisely, utility computing refers to offering a bundle of resources (e.g. computation, and storage) as a metered service. Comparable with a utility (such as electricity, water or gas) the user is unaware about where the resources are coming from. Hence, utility computing can be understood as the on-demand delivery of infrastructure, applications, and business processes in a securityrich, shared, scalable, and standards-based computer environment over the Internet for a fee. Customers will tap into IT resources and pay for them (Rappa 2004)]. Utility computing provides companies with the advantage of low investment costs to acquire hardware, as resources can be rented. As the terms utility computing and Grid computing is closely related, we will use them as synonyms throughout the remainder of the book. The most prominent example of utility computing is Sun s platform network.com ( Network.com makes use of Grid computing technologies and offers computational resources for a fix price of $1 per CPU hour. Although utility computing has the potential to mark the end of corporate computing as Nicholas Carr terms it (Carr 3 Synonyms are cloud computing or on-demand computing. 5

10 2005), neither Sun s network.com nor any other utility computing initiative has succeed to exploit the potential. Currently, Sun has had only one single customer, Virtual Computing Company (VCC) that bought over 1 million CPU hours from Sun. It is a well known fact that VCC paid a price way less than $1 per CPU hour. Rumors are around that Sun has bought VCC for marketing purposes. These rumors are to some extent confirmed by the fact that VCC also heavily invested in the build up of their own infrastructure, which is said to exceed the 1 million CPU hours by far. The reasons why utility computing has not yet taken off are manifold. Legal problems are among the most severe. For instance network.com conducts all computations within the USA. Requests from other countries could not be accepted due to the absence of legally binding contracts. Sun has now remedied this shortcoming, allowing users from over 25 countries to use network.com on-demand. Other shortcomings address the fact that $1 per CPU hour is too expensive. $1 was merely set by Sun as a marketing message that can be easily communicated. According to Sun officials, the margin for CPU hours is not yet skimmed off. Apparently $1 per CPU hour establishes the break-even point for Sun if their infrastructures are utilized by 30%. Amazon offers their idle CPU hours for 10 cents per CPU hour (Butler 2006). In their fee structure they also include a fraction for bandwidth usage, which adds to the 10 cents. 3 Markets as the Business Model for Grid The need for computational resources from industry especially in the automotive or pharmaceutical area is tremendous, but current providers are struggling to accommodate this need. It seems that the fix prices are still too high. Throughout this book it is argued that viable business models are missing for Grid computing. This hypothesis is in line with the Forrester Research of Grid Computing The451Group (Fellows, Wallage et al. 2007). Markets are known to work effectively in allocating resources (who gets which resources and when?), especially in cases where the resources are under decentralized control, users and providers are selfish, and where demand and supply are fluctuating. This ability of markets is gained by establishing dynamic prices that reflect demand and supply situations. Hence markets seem to be the adequate model to determine accurate prices (which are unknown right now) for the users and providers to be acceptable. In addition, price signals help to identify priorities in the requests (Neumann, Holtmann et al. 2006). 3.1 Pricing strategies In the following, it is explored what kind of pricing strategies are deemed promising for a use in Grid computing. As in any distributed system, a key problem to sharing resources is find a suitable allocation (i.e. who gets what?). Resource allocation is not a problem, if sufficient resources are available. Once, demand exceeds supply this becomes more problematic, as a scheduling unit in the Grid must decide, which demand will be supplied with the given resource endowment. Nowadays, resource allocation increasingly becomes an issue, because resource demands of data mining applications, scientific computing, rendering programs, and Internet services have kept pace with hardware improvements (Cherkasova, Gupta et al. 2007). This is expected to further increase. Traditionally, Grid systems address the resource allocation problem with naïve heuristics (e.g., first-in first out) or with more complex idiosyncratic cost functions (e.g. throughput maximization) (Buyya, Stockinger et al. 2001; Buyya, Abramson et al. 2004). Today's schedulers have recognized the need of expressing values by providing a combination of approaches including user priority, weighted proportional sharing, and service level agreements that set upper and lower bounds on the resources available to each user or group (Irwin, Grit et al. 2004; LSF 2005; Sun 2005). 6

11 One example will show why value-oriented algorithms are needed. Suppose SME Integrated Systems needs to submit the accounting statement in time in order to avoid a huge fine. Nonetheless, Integrated Systems resource request cannot be computed right away, as the resources are blocked by a big rendering job by corporation MovieRender. MovieRender needs the results of the rendering in two weeks and is indifferent whether this rendering job is computed now or the next week. The resource allocation should (re)allocate resources to Integrated Systems which needs them urgently reducing the allocation to MovieRender, as long as the delay does not change MovieRender value (i.e. the job is conducted within the given timeframe). Proportional Share is an example of such a scheduling mechanism. In essence, Proportional Share attaches to any user a weight. The resources allocated to the users depend on their weighted shares. For example, if there are two users Integrated Systems and MovieRender, where Integrated Systems has a weight of 1 and MovieRender of 2, Integrated Systems is allocated 1/3 of the resources, leaving 2/3 to MovieRender. This is effective, if MovieRender always performs more valuable computations than Integrated Systems. In our example, Integrated Systems clearly has suddenly a higher valuation and MovieRender has a low valuation. This is rather common, as the arrival process of important work is highly bursty (similar to a web-, -, file server) (Lai 2005). As a consequence, the situation would be efficient, if MovieRender announces that his job is not time-critical and reduces its weight, while Integrated Systems increases her weight correspondingly. But, if MovieRender s rendering creates a small value, then there is no incentive for MovieRender to lower his value, since it would reduce its own value without compensation. In other words, maximizing the utility (i.e. the sum of valuations) is possible only if the scheduler knows the attached valuations, or as in the case of Proportional Share, the exact relative weights. However, as in the example MovieRender has no incentive to reduce its weight, when it looses value by not getting its computation done. This is a common phenomenon in Economics, known as the Tragedy of the Commons, which states that the utility maximizing behavior of the clients results in an overall inefficient solution (Hardin 1968). Value-oriented approaches are, hence, not sufficient per se, to achieve an efficient solution. Only if all participants are willing to report their priorities and values honestly, these algorithms such as Proportional Share will work well. As the example shows, in most cases there will be no incentive for the participants, who act strategically in order to maximize their utility. One way to prevent strategic misrepresentation of the weights would be to install a system administrator that monitors the system and dynamically changes the weights of the users to maintain high efficiency. Even inside organizational units this is expensive and extremely error-prone. Once organizational boundaries are crossed, such a system-wide monitoring is politically almost impossible to achieve. This is where markets enter the discussion. Markets have the ability to set the right incentives for users to reveal their true valuation as well as for resource owners to provide those resources that are scarcest in the Grid. By the dynamic adjustment of the prices, markets can react to the ever changing conditions of resource demand supply. This flexibility is one major advantage against long-term heuristics like Proportional Share. Proportional Share typically determines the proportional value for a longer period, say a week or a month. Obviously, Proportional Share does neither set the incentive to shift usage from a high demand resource to a low demand resource, nor does it reward shifting usage from a high demand period into a low demand period. For instance, the system as a whole may have high demand for a certain number of CPUs and low demand for memory. Since those resources are typically allocated separately, there will be no incentive in the system to substitute CPUs by memory (Sullivan and Seltzer 2000; Lai 2005). 7

12 With the introduction of prices, incentives will be given to the users to substitute the scarce resource (e.g. number of CPUs) with less scarce resource (memory). For instance, a fixed pricing scheme which requests 10 for one CPU and 1 per Gigabyte of memory, sets incentives to reduce the number of used CPUs in favor of the cheaper memory. A fixed pricing scheme, however, is not enough to achieve an efficient allocation, as the following consideration will reveal: Suppose the fixed price is as in Figure 1 (c.f. Lai 2005). Demand changes over time this will be depicted by the parabola without loss of generality the costs of supplying resources are assumed to be zero. If demand is below the fixed-price (left end of the graph), no resource will be requested. This is because the value for the resource, represented by the demand curve is below the price. In our example, neither Integrated Systems nor MovieRender would buy the resources as they are too expensive. As a consequence, there is a loss of utility, which is denoted by the area below the demand curve. In the middle part of the Graph there are could be more buyers willing to pay the fixed price, as their demand exceeds the price. For instance both Integrated Systems and MovieRender have a valuation higher than the fixed price. If the seller allocates the resource efficiently to the user who values the resource most (Integrated Systems in our example), there will be unrealized profit for the seller indicated by the striped gray area, which refers to the difference between the demand curve and the fixed price. Figure 1: Fixed Pricing (Lai 2005) Unfortunately, there is no mechanism for the seller to find the user with the highest resource. In many cases, the buyer receiving the allocation is someone who is below the highest valuation. This is shown in the right part of Figure 1, where some user with inferior valuation (MovieRender in our case) receives the allocation. The difference between the demand curve of the highest valued user and the one who receives the allocation marks the efficiency loss or in other words waste. A market mechanism has the ability to set the prices in a way that most, if not the entire utility under the demand curve can be realized; resulting in an efficient allocation, where a system designer cannot extract more utility. The gist of a market is to couple price discovery with the valuations of the users, which are expressed in the form of bids. For instance, in our example Integrated Systems would submit a bid with a high valuation to the Grid Market, reflecting the high demand for resources. MovieRender would only mod- 8

13 erately bid for resources, since it does not matter whether the job is conducted now or at a later time. For Integrated Systems, it is inadvisable to submit a bid lower than its valuation, since this would lower the probability of winning. Conversely, bidding higher than the valuation would create the risk of paying more than the job is worth. For MovieRender, the same is true. As a result of the market mechanism, Integrated Systems receives the resources and pays a moderate price. 3.2 The need for Self-organization In summary, markets in Grids represent a business model for the resource owners, as a way to get money for their resource provision. On the other hand, markets are adequate way to determine the most efficient allocation making the Grid attractive for users. Suppose Integrated Systems needs more than one resource for the completion of the accounting statement, say CPU cycles and memory. Then Integrated Systems needs to formulate bids for both resources. If the resource demand is varying over time, Integrated Systems would have to adjust the bids frequently. For long-term contracts, manual discovery of the bids, appears to be reasonable, it is ineffective if low-value bids must be frequently submitted. Especially, if the resources are purchased on-demand, manual bid formulation is too slow and creates too many transaction costs. To reduce overhead, it is advisable to automate the bidding process. If the automated bid generator is coupled with the ICT infrastructure and possesses business knowledge on how to submit bids, the entire bidding process can be virtualized. Once the bidding process is virtualized, the ICT system of an organization will self-organize its resource management. Overcapacity will be provided over the market while undercapacity initiates a buying process. The idea of the self-organizing resource management in depicted in Figure 2. Supply Demand Supply Situation Information Policies Bidding Agent Bids Open Grid Market Bids Bidding Agent Information Policies Demand Situation Agreements Agreements Resource Owner A Resource Provision Payment User I Figure 2: Self-organizing Resource Management Depending on the current demand for resources created by the Grid application, the bidding agent of the users autonomously defines bids on the basis of user-specific policies. The bids are submitted to the Open Grid Market. Likewise, on the resource owner side, the bidding agents publish automatically available resources based on the policies. The Open Grid Market matches requesting and offering bids and executes them against each other. The allocations are formulated in agreements (or contracts). The fulfillment of the agreements will be automatically executed, while the Grid middleware is responsible for the resource provisioning and the payment system (such as PayPal) for the monetary transfer of funds. Thus, to be effective, any Grid Market system needs to be composed of three components: An Open Grid Market, which determines the allocation in an efficient way and offers in addition complementary market services. Openness is widely defined and refers to the 9

14 participants (i.e. all participants are potentially admitted to the market) and to the communication protocols, so that the market can be accessed by any kind of middleware system. Economic Grid Middleware, which extends common virtualization middleware (e.g. Globus, GRIA, UNICORE, glite, VMWare, Xen) in a way that the allocations determined by the Open Grid Market can be deployed. Intelligent Tools, which automate the bidding process. Any reasonable effort in establishing Grid markets ultimately need to involve all three components, as these are prerequisite for full automation of the market process (Neumann 2006). 4 Related Work Since the 60s researchers have motivated the use of markets as a means to cope with those incentival problems in distributed computing. The first attempt has been made with auctioning off time slots of the Harvard supercomputer (Sutherland 1968). While this primer was purely paper-based and restricted to one single computer, subsequent proposals and prototypes offered automated trading in distributed environments. Despite the fact that the idea of using markets in distributed computing is not new, no implementation has made it into practice. 4.1 Market-based Grid systems In the following the most significant efforts will be compiled. Essentially, there are several ways to classify market-based distributed computing environments. This section will use a classification along the lines of Buyya et al and will sketch the related work (Buyya, Stockinger et al. 2001; Yeo and Buyya 2004). The underlying taxonomy is structured around the market mechanisms, which are incorporated in the distributed computing environments. 4 It will be differentiated into the following market mechanisms: Posted Price In posted price settings the price is revealed openly to all participants. As we have shown in our preceding motivation, the posted price is inadequate from an economic point of view if demand and/or supply are strongly fluctuating. Commodity Market In a commodity market, the resource owners determine their pricing policy and the user determines accordingly his/her amount of resources to consume. Pricing can depend on the parameters such as usage time (e.g. peak-load pricing) or usage quantity (e.g. price discrimination) (Wolski, Plank et al. 2001). In many cases, it is referred to flat rates, which boils down to fixed price until a certain amount of resources or a certain time is reached. Bargaining In bargaining markets, resource owners and users negotiate bilaterally for mutually agreeable prices. 5 By gradually lowering their claims, resource owners and users eventually reach an agreement. Logrolling for several attributes can also be included in the bargaining protocol (Rosenschein and Zlotkin 1994; Kersten, Strecker et al. 2004). 4 From an economic point of view, this taxonomy is not detailed enough. Nevertheless, for a rough classification of running resource management systems this taxonomy is despite its lack of details sufficient. 5 A game-theoretic treatment of applicable bargaining models can be found at (Wolinsky 1988; De Fraja and Sakovics 2001; Satterthwaite and Shneyerov 2003). 10

15 Contract Net Protocol In the contract net protocol the user advertises its demand and invites resource owners to submit bids. Resource owners check these advertisements with respect to their requirements. In case the advertisement is favorable, the resource owners respond with bids. The user consolidates all bids, compares them and selects the most favorable bids(s). Proportional Share Proportional Share assigns resources proportionally to the bids submitted by the users. Auctions Auctions are mediated market mechanisms. An auctioneer collects bids from either one market side (buyers or sellers) or from both. According to the auction rules (which are known to all bidders) the auctioneer allocates the resources. Typical one-sided auctions are the English, First-Price Sealed Bid, Dutch and Vickrey auction. Two sided auctions are the Double Auctions. A survey using the above mentioned taxonomy over selected market-based resource management systems are summarized in the following. Computing Platform Marketbased System Market Mechanism Description Distributed Computing SPAWN Auction The SPAWN system provides a market mechanism for trading CPU times in a network of workstations (Waldspurger, Hogg et al. 1992). SPAWN treats computer resources as standardized modities and implements a standard Vickrey auction. It is known from auction theory that the Vickrey auction attains (1) truthful preference revelation and (2) an efficient allocation of resources (Krishna 2002). However, SPAWN does not make use of the neralized Vickrey auction, which can cope with ties. 6 Furthermore, the Vickrey auction can traditionally neither cope with multiple attributes nor with different time slots. Peer-to- Peer Stanford Peers Bargaining The Stanford Peers model is a Peer-to-Peer system which implements auctions within a cooperative bartering model in a cooperative sharing environment (Wolski, Brevik et al. 2004). It simulates storage trading for content replication and archiving. It demonstrates distributed resource trading policies based on auctions by simulation. Internet POPCORN Auction POPCORN provides an infrastructure for global distributed computation (Nisan, London et al. 1998; Regev and Nisan 1998). POPCORN mainly consists of three entities: (i) A parallel program which requires CPU time (buyer), (ii) a CPU seller, and (iii) a market which serves as meeting place and matchmaker for the buyers and sellers. Buyers of CPU time can bid for one single commodity, which can be traded executing a Vickrey auction repeatedly. POPCORN obviously suffers under the same shortcomings as SPAWN. 6 Users often demand a bundle of resources. Since they need more than one resource to complete a task, they value the bundle higher than the sum of its constituents. In an extreme case, the constituents may generate no value at all only the bundle creates value. If the resources are sequentially auctioned, the user is facing the so-called exposure risk, once he received one leg of the bundle, he is in danger of running a loss, if he does not get the other leg. 11

16 Computing Platform Marketbased System Market Mechanism Description Grids Bellagio Auction Bellagio is intended to serve as a resource discovery and resource allocation system for distributed computing infrastructures. Users express preferences for resources using a bidding language, which support XOR bids. The bids are formulated in virtual currency. The auction employed in Bellagio is periodic. Bids from users are only accepted as long as enough virtual currency is left (AuYoung, Chun et al. 2004). Grids CATNETS Bargaining In CATNETS, trading is divided into two layers, the application/service layer and the resource layer. In both layers, the participants have varying objectives which change dynamically and unpredictably over time. In the application/service layer, a complex service is a proxy who negotiates the access to bundles of basic service capabilities for execution on behalf of the application. Basic services provide an interface to access computational resources Agents representing the complex services, basic services and resources participate in a peer-to-peer trading network, on which request are disseminated and when an appropriated provider is found, agents engage in a bilateral bargaining (Eymann, Reinicke et al. 2003; Eymann, Reinicke et al. 2003; Eymann, Ardaiz et al. 2005). Grids G- Commerce Commodity market, auction G-Commerce provides a framework for trading computer resources (CPU and hard disk) in commodity markets and Vickrey auctions (Wolski, Plank et al. 2001; Wolski, Plank et al. 2001; Wolski, Brevik et al. 2003). While the Vickrey auction has the abovementioned shortcomings in grid, the commodity market typically works with standardized products. Additionally, the commodity market cannot account for the complementarities among the resources, as only one leg of the bundle is auctioned off, exposing the bidder to the threshold risk. Grids Nimrod/G Commodity market Grids OCEAN Bargaining/ Contract net Nimrod/G enables users to define the types of resources needed and negotiate with the system for the use of a particular set of resources at a particular price (Buyya, Abramson et al. 2000). This requires from the system to conduct resource discovery, which can become quite complex, as the numbers of resources can be large. Also, does the system need to support price negotiations, which may be complex as well. Both resource discovery and negotiation can become very cumbersome, if the users demand bundles instead of single resources. OCEAN (Open Computation Exchange and Arbitration Network) is a market-based infrastructure for high-performance computation, such as Cluster and Grid computing environments (Acharya, Chokkaredd et al. 2001; Padala, Harrison et al. 2003). The major components of the OCEAN s market infrastructure are user ponents, computational resources, and the underlying market chanism (e.g. the OCEAN Auction Component). In the OCEAN framework, each user (i.e. resource provider or consumer) is represented by a local OCEAN node. The OCEAN node ments the core components of the system, for instance a Trader Component, an Auction Component, or a Security Component. The implemented OCEAN auctions occur in a distributed Peer-to- Peer manner. The auction mechanism implemented in the OCEAN 12

17 Computing Platform Marketbased System Market Mechanism Description framework can be interpreted as a distributed sealed-bid nuous double-auction (Acharya, Chokkaredd et al. 2001). A trade is proposed to the highest bidder and the lowest seller. Afterwards, the trading partner can renegotiate their service level agreements. The renegotiation possibility one the one hand allows to cope with multiple attributes and with the assignment of resources to time slots. Nonetheless, the negotiation makes the results of the rent auction obsolete. Neither the auction can enfold its full tial, nor can the negotiation guarantee to achieve an efficient cation, as competition is trimmed. Grids Tycoon Proportional Share P2P clusters like the Grid and PlanetLab enable in principle the same statistical multiplexing efficiency gains for computing as the Internet provides for networking. Tycoon is a market based distributed resource allocation system based on an Auction Share scheduling algorithm (Lai, Rasmusson et al. 2004). Tycoon distinguishes itself from other systems in that it separates the allocation mechanism (which provides incentives) from the agent strategy (which interprets preferences). This simplifies the system and allows specialization of agent strategies for different applications while providing incentives for applications to use resources efficiently and resource providers to provide valuable resources. Tycoon s distributed markets allow the system to be fault-tolerant and to allocate resources with low latency. Auction Share is the local scheduling component of Tycoon. Table 1: Summary of market-based systems (adapted from (Yeo and Buyya 2004)) Those market-based resource management systems, however, are purely prototypical implementations. Apparently, only two prototypes implement all three components being CAT- NETS and Tycoon. Tycoon developed by Hewlett Packard offers the most comprehensive realizations of the system. However, Tycoon still fails to meet the challenge, as the supported economic models and algorithms are too simplistic and the deployment requires rebooting the resources before provisioning. CATNETS offers very interesting features but lacks comprehensive support (e.g., monitoring, multi platform deployment). All other approaches neglect the provision of decision support. In ordinary commodity markets, bidders have to come up with their bids themselves by rational deliberation. In Grid markets this can become extremely cumbersome, as demand and supply fluctuates dynamically and is in most of the cases unknown. In addition, the complex auctions make it difficult even for humans to devise suitable strategies. 4.2 Markets are dead, long live markets Despite the shortcomings of providing intelligent tools, economic middleware extensions and a Grid market, there are several other reasons why previous systems have not (yet) made it into practice. Those other reasons partially stem from the fact that several prerequisites on the technical and economic side have not been given. In recent times, advancements in Grid Computing and Economics have been achieved so that the prerequisites as well as the environment have changed by now. Those advancements favor the application of market-based platforms for Grid systems. Among the most important changes are the following three (Shneidman, Ng et al. 2005): Increasing Demand for Resources: In the past, demand for computational resources was not an issue. As a consequence, there was no scarcity of resources that would have 13

18 required value-based scheduling mechanisms. Thus, past market-based systems hardly (if ever) saw real field tests, and contention was often artificially generated. This has changed due to the increasing demand for resources that exceeds supply. A deployed market-based Grid system could solve real resource conflicts. The confrontation of market-based Grid systems is essential, as the real data will give researchers the insights to evaluate (and to adjust) their market-based resource schedulers. Improved technical infrastructure: Past systems had to deal with limitations in technical Grid infrastructures. For example, the enforcement of the allocation decision was not possible. Market mechanisms without proper enforcement are useless, as the agreements are not binding. Recent advancements in the area of operating systems and Grid middleware have overcome this and other technical problems. Advanced economic models and algorithms: Market-based Grid systems exclusively use simple market mechanisms. Simple market mechanisms, however, cannot attain economic efficiency, as they are unable to capture complementarities (i.e. one resource creates value for a user only in connection with another resource). During the past decade, advances have been made in the theory and practice of more complex market mechanisms. Current mechanisms can support combinatorial bidding, which allow the expression of multiple resource needs. As such, those bidding languages can represent any logical combination of resources, such as AND, OR, and XOR. On the other hand, these combinatorial mechanisms come at an additional cost of computation. Solving those resource allocation problems becomes fairly computationally expensive. Hence, the developments in market mechanisms need to be complemented by standard solver, which can determine the allocation within an acceptable time frame. In recent times, off-theshelf solvers such as CPLEX have been developed that can reach both efficient allocation and acceptable computation time. The key to the successful deployment of market-based Grid systems is a holistic and interdisciplinary approach. On the one hand, it is important to have a strong grip on the technical infrastructure. On the other hand, it is also important to make use of economic design knowledge. The concept of an Open Grid Market is expected to fill the current gap paving the way towards widespread Grid adoption (Neumann 2008). 5 An Architecture for Open Grid Markets "As the size and complexity of software systems increases, the design problem goes beyond the algorithms and data structures of the computation: designing and specifying the overall system structure emerges as a new kind of problem" (Garlan and Shaw 1993). To address this complexity, an architecture-based system development approach seems promising. The architecture-based approach focuses on the organization and global control structure of the system, on protocols for communication, synchronization, and data access (Garlan and Shaw 1993). Key element of this approach is the architecture whose importance goes far beyond the simple documentation of technical elements. The architecture rather serves as the blueprint for both the system itself and the project developing it. As an Open Grid Market appears to be a very complex system, the architecture will be presented. Before, the rationale of the Open Grid Market will be given, revealing why openness is important in the Grid context. 5.1 Architecture Rationale As presented in the related work, there are several ongoing attempts to establish a marketbased Grid system. The approach pursued here is different, as it does not strive for "yet another" Grid market system. Instead, a market system is envisioned that is open, with respect to 14

19 1. the communication protocols being used, and 2. the underlying local resource managers of the traded resources. This devotion to openness has major ramifications on the scope of the market: Any potential Grid user can access the Open Grid Market via open communication protocols. Furthermore, resource providers with different virtualization platforms and resource managers can easily plug in the Open Grid Market. The idea is set up a flexible market infrastructure, which can access resources over all kind of virtualization platforms by means of wrappers. The openness is intended to offer the possibility of loosely integrating emerging Grid markets. For example the Open Grid Market needs to combine platforms like network.com or Amazon s elastic cloud but should also be capable of accessing Beowulf or MOSIX clusters. This approach should attract already existing resource providers such as Sun to plug in the Open Grid Market. Obviously, competition on such Open Grid Markets encourages markets to integrate and weed out too complex or inefficient platforms. 5.2 Open Grid Market Architecture The architecture of the Open Grid Market describes the functionalities of the entire market system in terms of its functional entities, their responsibilities and their dependencies (see Figure 3) (Neumann, Stoesser et al. 2008). Figure 3: Open Grid Market Architecture Boxes represent functional entities that can be encapsulated in a corresponding component. Arrows represent dependencies, where an arrow from an entity A to an entity B means that entity A either receives information or consumes a service from entity B. The architecture is structured along four layers. Ultimately, it is the Grid application and the available resources that create demand and supply for computational resources and thus give 15

20 rise to the establishment of an Open Grid Market. Accordingly, Grid applications as well as the resource mark the fourth and most low-level layer. Layer 3 consists of the intelligent tools that translate the resource demand or supply by the applications and machines into bids. Layer 2 concerns the market functionalities, whereas layer 1 comprises the core market services comprising the basic communication infrastructure that is common to all potential markets. The intuition for dividing layer 2 and 1 is very simple. While the core market services establish the common trading platform every market uses, the Open Grid market components can be different from market to market Layer 1 Core Market Services As aforementioned, state-of-the-art Grid middleware does not provide all the infrastructure services necessary for supporting an Open Grid Market. Layer 1 extends the standard Grid middleware by additional infrastructure services so that the Open Grid Market can be hooked up with the Grid middleware: Trusted market exchange service: All communication among market participants users and their applications as well as and providers and their resources is mediated by this service, which assures that information is routed to the designated recipients in a secure and reliable way. The trusted market exchange service also enforces policies defined on specific messages for logging, encryption, and signing. Logging: All transactions executed on the market are registered in a secure log for auditing purposes. Market directory: The market directory is a market-enabled extension to the commonplace service registries in (Grid) middleware. The technical information on Grid markets directories are enriched by economically relevant information like pricing or quality of service parameters. The enriched information serves as input to the trading management component on layer 2. Market information: The market information service allows market participants to publish information and to gather information from other participants (e.g., prices, resource usage levels). Participants can query the service or subscribe to topics. Information queries consider either instantaneous measures or their history Layer 2: Open Grid Market The Open Grid Market in layer 2 defines the arena, where the published resources are assigned to the Grid applications, following certain market mechanisms. Trading management: The trading management component is the access point for the users to the Open Grid Market where they can find the offered resources and submit their according bids. More specifically, the trading management matches the technical descriptions of the requests obtained from the bids to suitable technical descriptions of the offered resources collected from the associated Grid market directories. Subsequently, the trading management manages the bidding process according to given market mechanisms (e.g. MACE auction (Schnizler, Neumann et al. 2006)). If the bidding process finishes successfully the corresponding bid and offer are submitted to the contract management. Contract management: The contract management component transforms corresponding pairs of bids and offers to mutually agreed contracts. One important part of these contracts are service level agreements (SLAs) which define the agreed terms of usage of the resources and the pricing. The contract management also initiates the enforce- 16

21 ment of the contract, especially the allocation of the sold resources (aided by the EERM) and the payment process. SLA enforcement and billing: The SLA enforcement and billing component is responsible for the surveillance and enforcement of the contracts it receives from the contract management. The component keeps track of the actual usage of the resources, makes comparisons to the SLA and (if appropriate) initiates the billing and clearing according to the results of the comparison. Payment: The payment service offers a unified interface to payment, isolating the rest of the components from the particularities of the payment mechanism. The payment service also generates appropriate logging/auditing information. Security management: The security management component is intended as the entry point for a single sign-on mechanism and is responsible for a tamper-proof identity management for the consumers, the suppliers and the constituent components of the Open Grid Market. Thus, all layers provide security connectors that build the technological bridges from the respective layers to the security management. Economically Enhanced Resource Management (EERM): The EERM component provides a standardized interface to typical Grid middleware (e.g. Globus Toolkit or Sun Grid Engine). The EERM can shield clients from resource platform specific issues and also enhance or complement the management functions provided by job scheduling and submission systems. The EERM s main duties include (i) resource management including the management functionalities to achieve the expected service levels and notify of deviations. The resource management furthermore coordinates independent resources to allow co-allocation if not provided by the fabrics (ii) resource monitoring (iii) accessing resource fabrics via standardized interfaces to create instances of resources and later make use of them by the application (Macías, Smith et al. 2007) Layer 3: Intelligent Tools The market participants including the Grid applications and resources are supported by intelligent tools for an easy access to the Open Grid Market. Consumer preference modeling: This component allows for the users to describe their economic preferences that will determine their bidding strategies on the Open Grid Market, e.g. they could define if they prefer cheap over reliable resources. One approach would be to provide the users with a GUI in the form of a simplified ontology modeling tool to instantiate a given consumer preference modeling language. Demand modeling: The user needs a tool to specify the technical requirements of her Grid application. The technical approach for this component is similar (i.e. ontologybased) to the preference modeling. Business modeling: Analogously to the consumer preference modeling, the providers have to specify their business models to determine the generation of their offers on the Open Grid Market. For example one part of such a description could be a pricing model that specifies if the consumer has to pay for booked time-slots or for the actual usage. As the example indicates, the models specified by means of this component depend on the implemented market type. Supply modeling: The resource modeling component is the correspondent of the demand modeling component. Aided by this component the providers can technically specify their offers. 17

22 Bid generation: The bid generation is the intelligent (i.e. agent-based) component that generates and places the bids of the consumer on the Open Grid Market. For this purpose it considers the user preferences, the technical requirements and the current state of the market and derives the bids. The bids are submitted to the trading Management component of the Open Grid Market. The bid generation component could be implemented with the help of a rule engine for logical inference over the mentioned inputs. Offer generation: The offers are assembled from the technical resource descriptions and the business model of the respective provider by the offer generation component. It also publishes the offers at the Grid market directory Layer 4: Grid Application Layer 4 pertains to the Grid applications and Grid resources for trade. At the provider side a provider IT specialist makes use of the intelligent tools in layer 3 to model the provider's business strategies and the offered Grid resources. On the consumer side it has to be distinguished between the Grid application's end user(s) and the consumer's IT support staff who will use the intelligent tools to model an application's resource requirements and the consumer's preferences. 6 Organization of the book This book is devoted to the development of economic models and algorithms for the Open Grid Market. As such, the book mainly deals with layer 2 and 3 as those areas comprise the components, where economic principles can help to improve the entire system. This book contains eighteen different papers, which are structured along three main sections: Market Mechanisms for Grid Systems, Economic Component Design, and Design Issues and Extensions. The papers present joint research conducted with colleagues at the Institute of Information Management and Systems within the scope of the EU-funded project CATNETS ( EU-funded project SORMA, ( BMBF sponsored project SESAM, ( project Billing the Grid funded by the Landesstiftung Baden-Württemberg as Landesschwerpunktprogramm ( and SSHRC funded project Electronic Negotiations, Media and Transactions for Socio- Economic Interactions ( 6.1 Market Mechanisms for Grid Systems The first section deals with different market mechanisms for scheduling Grid resources. Apparently, it is devoted to the economic modeling of the trading management component in layer 2 of the architecture. All papers focus on the design of market mechanisms for different market segments of the Grid market. This section consists in total of eight papers, where the first paper is an overview paper and the subsequent seven papers deal with different market mechanisms Bridging the Adoption Gap Developing a Roadmap for Trading in Grids The paper Bridging the Adoption Gap Developing a Roadmap for Trading in Grids argues that the technology of Grid computing has not yet been adopted in commercial settings 18

23 due to the lack of viable business models. While in academia Grid technology has already been taken up, the sharing approach among non for-profit organizations is not suitable for enterprises. In this paper, the idea of a Grid market is taken up to overcome this Grid adoption gap. Although this idea is not new, all previous proposals have been made either by computer scientists being unaware of economic market mechanisms or by economists being unaware of the technical requirements and possibilities. This paper derives an economically sound set of market mechanisms based on a solid understanding of the technical possibilities. More precisely, the paper analyzes the characteristics of the object traded in Grids. This trading object is closely related to the deployment of software applications via the EERM. The deployment directly on physical resources or via raw application services has major ramifications for the trading object and consequently for the requirements for market mechanisms. Physical resources are essentially commodities, whereas application services can be both standardized commodities and unique entities. Based on this analysis, a two-tiered market structure along the distinction between physical resources and application services is derived, where each tier demands different market mechanisms. The first tier comprises the markets for physical resources (e.g. CPU, memory) and raw application services. The second tier comprises the markets for complex application services. Subsequently, existing Grid market mechanisms are classified according to this market structure. At the core of this paper, we argue that there is no single market that satisfies all purposes. Reflecting the distinct requirements of different application models and deployment modes, a catalogue of co-existing market mechanisms is needed. This paper provides the outline for the following papers, which present the different mechanisms of the catalogue and derive their economic properties A Truthful Heuristic for Efficient Scheduling in Network-centric Grid OS The paper A Truthful Heuristic for Efficient Scheduling in Network-centric Grid OS argues that a network-centric Grid OS coupled with market-based scheduling can increase efficiency in Grid and cluster environments by adequately allocating all available resources. This distinguishes network-centric Grid OS from state-of-the-art Grid middleware, which rely on batch processing of idle resources only. While there are highly sophisticated market mechanisms available for Grid middleware where raw services are being traded (e.g. MACE), there are no mechanisms available for network-centric Grid OS which rely on interactive application processing. The paper strives for filling this need by defining a market mechanism as a multi-attribute exchange in which resource owners and consumers can publish their demand and supply for resources deployed as raw services. Exact market-based scheduling mechanisms share one deficiency: they are quite complex. While this may not be a problem in a cluster setting with a small number of users, it may prove crucial in an interactive, large-scale Grid OS environment. Hence a greedy heuristic is designed, which performs a fast scheduling while retaining truthfulness on the request-side and approximating truthfulness on the provisioning-side of the market. In summary, the contributions of this paper are threefold: Firstly, the paper designs a multiattribute exchange that can be used for Grid OS. Secondly, a greedy heuristic is employed to solve the scheduling problem. Thirdly, an adequate pricing scheme is developed which assures that reporting truthfully is a dominant strategy for resource requesters and payments to resource providers are approximately truthful. 19

24 6.1.3 Economically Enhanced MOSIX for Market-based Scheduling in Grid OS The paper Economically Enhanced MOSIX for Market-based Scheduling in Grid OS is a follow-up of the previous paper. Essentially, the paper describes the implementation of the greedy heuristic in the state-of-the-art network-centric Grid OS MOSIX (i.e. Multi-computer Operating System for Unix). The paper is a result of the co-operation with Amnon Barak and Lior Amar (both The Hebrew University of Jerusalem) who co-author this paper. By introducing market mechanisms to MOSIX, end users can influence the allocation of resources by reporting valuations for these resources. Current market-based schedulers, however, are static, assume the availability of complete information about jobs (in particular with respect to processing times), and do not make use of the flexibility offered by computing systems. In this paper, the implementation of a novel market mechanism for MOSIX, a state-ofthe-art management system for computing clusters and organizational grids is described. The market mechanism is designed so as to be able to work in large-scale settings with selfish agents. Facing incomplete information about jobs characteristics, it dynamically allocates jobs to computing nodes by leveraging preemption and process migration, two distinct features offered by the MOSIX system. The contribution of this paper can be summarized as follows. Firstly, a market mechanism is proposed, which is specifically tailored towards the needs and technical features of MOSIX so as to generate highly efficient resource allocations in settings with large numbers of selfish agents. Secondly, the implementation of the proposed market mechanism in MOSIX is presented which serves as a proof-of-concept GreedEx A Scalable Clearing Mechanism for Utility Computing The paper GreedEx A Scalable Clearing Mechanism for Utility Computing extends the mechanism proposed in the paper A Truthful Heuristic for Efficient Scheduling in Networkcentric Grid OS by a more elaborate pricing scheme. The newly developed market mechanism is dubbed GreedEx, an exchange for clearing utility computing markets, based on a greedy heuristic. This market mechanism can be used to perform this clearing in a scalable manner while at the same time satisfying basic economic design criteria. Besides its computational speed, the main characteristic of GreedEx is its ability to account for the inter-organizational nature of utility computing by generating truthful prices for resource requests and approximately truthful payments to utility computing providers. In general, however, the computational speed of heuristic comes at the expense of efficiency. Consequently, a numerical simulation is presented, which shows that GreedEx is very fast and, surprisingly, highly efficient on average. These results strengthen our approach and point to several interesting avenues for future research. Obviously, the contribution of this paper is the design as well as analytical and numerical evaluation of GreedEx that achieves a distinct trade-off, as it conducts fast near-optimal resource allocations while generating truthful prices on the demand-side and approximately truthful prices on the supply-side. GreedEx may be used for two purposes: Firstly, utility computing providers (such as Sun) may choose to run a proprietary GreedEx-based marketplatform to allocate their scarce computing resources to resource requests in an efficient manner and to dynamically price these requests. Moreover, a GreedEx-based platform may be run as an intermediary market which aggregates demand as well as supply across multiple resource requesters and utility computing providers, thus increasing liquidity and efficiency. 20

25 6.1.5 Market-Based Pricing in Grids: On Strategic Manipulation and Computational Cost The paper Market-Based Pricing in Grids: On Strategic Manipulation and Computational Cost uses the GreedEx mechanism and shows in several propositions the impossibility of a perfect heuristic that achieves truthfulness on both market sides. In addition, the paper proofs propositions on split- and merge-proofness as a measure of fairness. In addition, it numerically shows that manipulation of individual agents very rarely pays off but is very often punished by the mechanism. In summary, the paper make the following contributions: an in-depth analysis of these pricing schemes with respect to the distribution of welfare among resource requesters and providers, the incentives of single users to try to manipulate the mechanism, and the computational impact of the pricing schemes on the mechanism is provided. In addition, basic design options of how the mechanism can be integrated into current Grid schedulers are outlined Trading grid services a multi-attribute combinatorial approach The paper Trading grid services a multi-attribute combinatorial approach proposes the derivation of a multi-attribute combinatorial exchange for allocating and scheduling services in the Grid. In contrast to GreedEx and other approaches, the proposed mechanism accounts for the variety of services by incorporating time and quality as well as coupling constraints. The mechanism provides buyers and sellers with a rich bidding language, allowing for the formulation of bundles expressing either substitutabilities or complementarities. The winner determination problem maximizes the social welfare of this combinatorial allocation problem. The winner determination scheme alone, however, is insufficient to guarantee an efficient allocation of the services. The pricing scheme must be constructed in a way that motivates buyers and sellers to reveal their true valuations and reservation prices. This is problematic in the case of combinatorial exchanges, since the only efficient pricing schedule, the VCG mechanism, is not budget-balanced and must be subsidized from outside the mechanism. The main contribution of this paper is the development of a new pricing family for a combinatorial exchange, namely the k-pricing rule. In essence, the k-pricing rule determines the price such that the resulting surpluses to the buyers and sellers divide the entire surplus being accrued by the trade according to the ration k. The k-pricing rule is budget-balanced but cannot retain the efficiency property of the VCG payments. As the simulation illustrates, the k-pricing rule does not rigorously punish inaccurate valuation and reserve price reporting. Buyers and sellers sometimes increase their individual utility by cheating. This possibility, however, is only limited to mild misreporting and a small number of strategic buyers and sellers. If the number of misreporting participants increases, the risk of not being executed in the auction rises dramatically. As a result, the k-pricing schema is a practical alternative to the VCG mechanism and highly relevant for an application in the Grid. The runtime analysis shows that the auction schema is computationally very demanding. However, the use of approximated solutions achieves adequate runtime results and fairly mild welfare losses for up to 500 participants. Comparing these results with an existing Grid testbed (PlanetLab) demonstrates the practical applicability of the proposed auction. 21

26 6.1.7 A Discriminatory Pay-as-Bid Mechanism for Efficient Scheduling in the Sun N1 Grid Engine The paper A Discriminatory Pay-as-Bid Mechanism for Efficient Scheduling in the Sun N1 Grid Engine regards resources as traded object. This work was jointly conducted with Simon See from Sun Microsystems, who also co-authored this paper. In essence, this paper proposes an extended model of the Sanghavi-Hajek pay-as-bid mechanism a promising addition to the N1GE scheduler. Current technical schedulers require an administrator to specify user weights based on these users relative importance, regardless of the dynamic demand and supply situation, leading to inefficiencies. To this end, the contribution of this paper is twofold: Mechanism design: A discriminatory pay-as-bid market mechanism by Sanghavi and Hajek is presented. Conditions are analytically derived under which this mechanism outperforms market-based proportional share the currently most prominent grid market mechanism with respect to both provider s surplus and allocative efficiency. Integration into Sun N1GE: It is further showed that this mechanism is not a purely theoretical construct but that it can be integrated into state-of-the-art grid schedulers to economically enrich the current allocation logics. The basic design considerations are conducted for the case of the N1 Grid Engine (N1GE) the scheduler of Sun Microsystems s grid platform Decentralized Online Resource Allocation for Dynamic Web Service Applications The paper Decentralized Online Resource Allocation for Dynamic Web Service Applications deals with the definition of an online market mechanism that allocates requests to resources once the requests are submitted to the system. Economic scheduling mechanisms introduce economic principles to Grids and thus promise to increase the systems overall utility by explicitly taking into account strategic, interorganizational settings. Current economic scheduling mechanisms, however, do not fully satisfy the requirements of interactive Grid applications. In this paper, a video surveillance environment is introduced and it is shown how its functionality can be encapsulated in Grid services which are traded dynamically using an economic scheduling mechanism. Subsequently, a mechanism from the general machine scheduling domain is introduced. The applicability of the market mechanism to the Grid context is showcased by a numerical experiment. However, the mechanism suffers from some limitations due to the distinct properties of Grids. These limitations are pinpointed: extensions to the basic mechanism are introduced, which may remedy these drawbacks. 6.2 Economic Component Design The second section deals with economic models and algorithms for the component Bid Generation (6.2.1 and 6.2.2) on layer 3 and for the components EERM (6.2.3), Contract Management (6.2.4), SLA enforcement (6.2.5), and Payment (6.2.6) on layer Self-Organizing ICT Resource Management Policy-based Automated Bidding The paper Self-Organizing ICT Resource Management Policy-based Automated Bidding is devoted to the bidding agent. In essence, it is a premise of this paper that markets can undertake this task better than any other coordination mechanism. This is especially the case if the bidding process is fully automated, such that the market itself will handle any unforeseen problem in the resource management process. 22

27 By bringing autonomic and Grid computing as well as economic principles closer together, market-based approaches have the potential to achieve an economy-wide adoption of Grid technology and will thus leverage the exploitation of potential Grid technology offers. To achieve this potential, many Grid Economics related obstacles must be overcome: markets can only work well if bids accurately reflect demand and supply. If this is not the case, the market price cannot reflect the correct situation. Thus, the market price looses its capacity to direct scarce resources to the bidders who value them most. This paper is unique by recommending an autonomic computing approach for bidding in Grid markets. By defining policies the business models of the service consumers and providers can be represented. The proposed bid generation process is currently quite simple and referring to observations and straightforward rules A Decentralized Online Grid Market Best Myopic vs. Rational Response The paper A Decentralized Online Grid Market Best Myopic vs. Rational Response establishes a bridge between the trading management component and the bidding agent. More precisely, the paper connects the automated bidding approach with the decentralized online mechanism proposed in Decentralized Online Resource Allocation for Dynamic Web Service Applications. To this end, the contributions of this paper are twofold. Firstly, the theoretical analysis of Decentralized Local Greedy Mechanism is based on the assumption of simple agents behaving according to a so-called myopic best response strategy. By means of a simulation with learning agents, it is shown that an analysis using myopic best response strategies overestimates the performance of the mechanism and is thus not an appropriate solution concept for modeling real-world market-based scheduling mechanisms. As a byproduct of this analysis, the performance benefits of market-based schedulers as opposed to purely technical schedulers which are solely based on system-centric measures are demonstrated. Secondly, limitations of the mechanism for the practical use are pointed out. Furthermore, remedies that may help to mitigate these drawbacks are suggested Economically Enhanced Resource Management for Internet Service Utilities The paper Economically Enhanced Resource Management for Internet Service Utilities local resource management systems are equipped with economic principles. The results reported in the paper have been achieved in cooperation with Jordi Torres and Jorid Guitart (both Barcelona Supercomputing Center) who co-author this paper. In this work various economical enhancements for resource management are motivated and explained. In addition a mechanism for assuring Quality of Service and dealing with partial resource failure without introducing the complexity of risk modeling is presented. It is shown how flexible dynamic pricing and client classification can benefit service providers. Various factors and technical parameters for these enhancements are discussed in detail. Moreover, a preliminary architecture for an Economically Enhanced Resource Manager integrating these enhancements is introduced. Due to the general architecture and the use of policies and a policy manager this approach can be adapted to a wide range of situations. The approach is evaluated considering economic design criteria and using an example scenario. The evaluation shows that the proposed economic enhancements enable the provider to increase his benefit. In the standard scenario a 92% of the maximum theoretically attainable revenue with the enhancements can be achieved in contrast to 77% without enhancements. In the scenario with partial resource failure the revenue is increased from 57% to 85% of the theoretical maximum. 23

28 6.2.4 Situated Decision Support for Managing Service Level Agreement Negotiations The paper Situated Decision Support for Managing Service Level Agreement Negotiations tackles the interface between contract management and the bidding agent. This paper reports about the results achieved during the cooperation with Rustam Vahidov (Concordia University, Montreal) who co-author this paper. This work proposes the application of the situated decision support approach to managing automated SLA negotiations. The framework is based on the model for situated decision support that effectively combines human judgment and autonomous decision making and action by agent components. The key idea behind the approach lies in the managing the fleet of negotiating agents by the use of a manager agent and human decision maker. It is shown through simulation experiments how this approach performs under a set of simplifying assumptions Using k-pricing for Penalty Calculation in Grid Market The paper Using k-pricing for Penalty Calculation in Grid Market is concerned with the SLA enforcement component, by proposing penalty schemes for SLA violations. This research has been conducted in cooperation with Omer Rana and Vikas Deora (both Cardiff University) who also co-author the paper. Essentially, the paper considers the important role of the design of service level agreements (SLAs) to distribute risk in Grids. SLAs are crucial as they are contracts that determine the price for a service at an agreed quality level as well as the penalties in case of SLA violation. This paper proposes a price function over the quality of service (QoS) on the basis of the agreements negotiated upon price and quality objective. This function defines fair prices for every possible quality of a service, which are in line with the business of the customer and incentivize the provider to supply welfare-maximizing quality. Therewith, penalties can be calculated for every possible quality level as the difference between the agreed price and the output of the price function for the effectively met quality. A price function according to the k-pricing scheme is presented for a single service scenario and for a scenario with multiple interdependent services Rightsizing of Incentives for collaborative e-science Grid Applications with TES The paper Rightsizing of Incentives for collaborative e-science Grid Applications with TES addresses the payment component. The work has been conducted within the scope of the research project Billing the Grid, which strives for designing and implementing a billing system for academia. For business Grids the payment component is comparatively easy, as money can be used as common denominator. In academia, the use of money is banned. Currently resources are used purely on availability. This raises incentive problems, as there is no fair allocation that takes the priority of a job into consideration. Researchers are using Grid resources regardless of whether others need them more urgently. In addition, researchers often tend to not contribute their own resources to the Grid, due to missing incentives. Thus, researchers are demanding a Grid infrastructure, which sets the right incentives to share supporting a fair allocation of resources. The most common instrument to establish incentives is the use of money. However, the constraint in academia rules out the use of money. In this paper, a mechanism called Token Exchange System (TES) is proposed. Considering the requirements from the particle physics community this mechanism combines the advantages of reputation and payment mechanism in one coherent system to realize a fair allocation. It enables to build up a Grid infrastructure with an incentive mechanism which is scalable, 24

29 incentive-compatible and does not require a face-to-face agreement like the current system. Transaction between institutes can be accomplished without burdening their financial endowment. Simulations demonstrate that TES has a better performance than a common payment system combined with a reputation mechanism. 6.3 Design Issues and Extensions While the first two sections are focusing on economic models and algorithms for the components of the Open Grid Market architecture, section 3 broadens the scope by addressing (i) design issues that helps to introduce the results in practice and (ii) extensions of the results concerning the applicability of the economic models and algorithms in other domains Distributed Ascending Proxy Auction A Cryptographic Approach The paper Distributed Ascending Proxy Auction A Cryptographic Approach strives for decentralizing auctions through the use of cryptography. As Grid computing is characterized by decentralized management, the use of auctions is questionable as they are centralized in nature. The results of the paper have been conducted with Michael Conrad and Christoph Sorge from the Telematics area within the scope of the interdisciplinary research project SE- SAM. The paper presents a secure mechanism for distributed ascending proxy auctions. It fosters privacy and overall correct conduct to a degree that enables the trusted application of an asynchronous, iterative, second-price auction in systems as open as P2P networks. The approach produces several desirable properties for the auction process: It eliminates the dependency on one single auctioneer, and the winning bidder can hide his true valuation, respectively his highest bid. Using an encrypted bid chain for bidding ensures that only a fraction of information is revealed to each auctioneer. With any new bid chain each bidder can freely decide, which auctioneer groups to trust. Robustness is achieved by forming groups of auctioneers, where even one group member suffices to decrypt a bid step. In contrast to previous distributed second-price auction mechanisms, this approach is suitable for iterative open-cry auctions. It is never necessary for all participants to be online at the same time, which is what makes other approaches highly vulnerable, as it allows the blocking of an auction fairly by attacking any participating node. In the proposed protocol, all a bidder needs to do is to convey his bid chains, whereas the auctioneer groups must be accessible during the whole auction process. However, one single obedient auction group member being online at a time suffices to conduct the auction with a sufficient standard of security. Obviously, this paper showcases how inherently centralized auctions can be decentralized through rigorous use of cryptography. The chosen auction is in particular difficult to decentralize. It is assumed that the mechanisms presented in section 6.1. are much easier to decentralize Technology Assessment and Comparison: The Case of Auction and E-negotiation Systems The paper Technology Assessment and Comparison: The Case of Auction and E-negotiation Systems investigates whether the use of auctions or negotiations are better suited for a use in situations, where the trading object is defined by more than one attribute. The paper introduces an Information Systems framework, which explains the antecedents of system performance emphasizing the role of mechanisms. A laboratory experiment shows how the framework can be used. This work helps to argue why the trading management uses auctions rather than negotiations. The paper has been conducted with Gregory Kersten, Rustam Vahidov and Eva 25

30 Chen (all from Concordia University, Montreal) within the scope of the Canadian SSHRC funded project Electronic Negotiations, Media and Transactions for Socio-Economic Interactions. The purpose of this paper has been to propose a model for studying the impacts of electronic exchange mechanisms on key variables of interests, both objective, as well as subjective ones. The proposed TIMES model provides a framework that allows studying of types of exchange mechanisms in their various implementations within different task, environment, and individual contexts. These mechanisms could range from the simplest catalogue-based models to advanced auction and negotiation schemas. Thus, the model can accommodate continuity in the key design principles of the mechanisms, as opposed to considering them as distinct lasses. Therefore, one of the key contributions of the model is that it enables the comparison of various exchange structures in terms of the same set of key dependent factors Comparing Ingress and Egress Detection to Secure Inter-domain Routing: An Experimental Analysis The paper Comparing Ingress and Egress Detection to Secure Inter-domain Routing: An Experimental Analysis addresses the issue of protocol security and adoption. The paper concentrates on the protocol of secure Border Gateway Protocol. It is assumed that the results can also be transferred to security mechanisms in the Open Grid market. The work has been conducted together with Christoph Goebel (Humboldt University Berlin) and Ramayya Krishnan (Carnegie Mellon University, Pittsburgh) who co-author this paper. Starting point of the paper is that the global economy and society is increasingly dependent on computer networks linked together by the Internet. The importance of networks reaches far beyond the telecommunications sector since they have become a critical factor for many other crucial infrastructures and markets. With threats mounting and security incidents becoming more frequent, concerns about network security grow. It is an acknowledged fact that some of the most fundamental network protocols that make the Internet work are exposed to serious threats. One of them is the Border Gateway Protocol (BGP) which determines how Internet traffic is routed through the topology of administratively independent networks the Internet is comprised of. Despite the existence of a steadily growing number of BGP security proposals, to date neither of them is even close to being adopted. The purpose of this work is to contemplate BGP security from a theoretical point of view in order to take a first step toward understanding the factors that complicate secure BGP adoption. Using a definition of BGP robustness we experimentally show that the degree of robustness is distributed unequally across the administrative domains of the Internet, the so-called Autonomous Systems (ASs). The experiments confirm the intuition that the contribution ASs are able to make towards securing the correct working of the inter-domain routing infrastructure by deploying countermeasures against routing attacks differ depending on their position in the AS topology. It is shown that the degree of this asymmetry can be controlled by the choice of the security strategy. The strengths and weaknesses of two fundamentally different approaches in increasing the BGP s robustness termed ingress and egress detection of false route advertisements and are compared against each other. Depending on the comparison the economic implications are discussed A Market Mechanism for Energy Allocation in Micro-CHP Grids The last paper of this book A Market Mechanism for Energy Allocation in Micro-CHP Grids strives for transferring the lessons learnt to other Grid markets. This paper addresses the 26

31 Energy market and motivates the use of markets for micro Grids. The results of the paper have been conducted within the scope of the interdisciplinary research project SESAM. Achieving a sustainable level of energy production and consumption is one of the major challenges in modern society. This paper contributes to the objective of increasing energy efficiency by introducing a market mechanism that facilitates the efficient matching of energy (i.e. electricity and heat) demand and supply in Micro Energy Grids. More precisely a combinatorial double auction mechanism is proposed that performs the allocation and pricing of energy resources especially taking the specific requirements of energy producers and consumers into account. The potential role of decentralized micro energy grids and their coupling to the large scale power grid is outlined. Furthermore an emergency fail over procedure is introduced that keeps the micro energy grid stable even in cases where the auction mechanism fails. As the underlying energy allocation problem itself is NP-hard, a fast heuristic for finding efficient supply and demand allocations is defined. Lastly, the applicability of this approach is shown through numerical experiments. 7 Conclusion and Outlook on Future Research In this section the work at hand is recapitulated. Moreover, an outlook on future research in this domain is outlined. 7.1 Concluding remarks The vision of a complete virtualization of Information and Communication Technology (ICT) infrastructures by the provision of ICT resources like computing power or storage over Grid infrastructures will make the development Open Grid Markets necessary. Over the Open Grid Market idle or unused resources computational resources as well as hardware can be supplied (e.g. as services) and client demand can be satisfied not only within organization but also among multiple administrative domains. In conventional IT environments, the decision who shares with whom, at what time is orchestrated via central server that uses scheduling algorithms in order to maximize resource utilization. Those central scheduling algorithms, however, have problems, when organizational boundaries are crossed, and information about demand and supply are manipulated. In particular, if demand exceeds supply, the scheduling algorithms fail to allocate the resources efficiently. The further virtualization of ICT infrastructure and services requires technical solutions that allow for a reasonable i.e. an economically efficient allocation of resources. Economic mechanisms can be the basis for technological solutions that handle these problems by setting the right incentives to accurately reveal information about demand and supply. Market or pricing mechanisms foster information exchange and can therefore attain efficient allocations. Establishing pricing mechanisms raises many questions: Can a price be calculated by only considering supply and demand, or will there be the need for a regulatory body? How can the overall supply and demand volume be detected? Will participants communicate their supply and demand wishes openly in bids, even if competitors have access to this information, e.g. to estimate production load? How can the bid generation process be automated so as to adequately reflect the underlying business model? In this work the design of an Open Grid Market is motivated. In addition, an architecture is proposed as a blueprint for Grid markets. To establish an Open Grid Market in practice, there are several obstacles that have to be overcome: The bidding process goes beyond the feasibili- 27

32 ty of mere manual configuration, so there is a need for intelligent tools that reduce the complexity of Grid-based systems. In essence, those intelligent tools must support the automation of the bidding process, which is dependent on the resource supply situation and on business policies. Additionally, the Open Grid Market needs to be equipped with intelligent monitoring tools that audit the resources continuously, in order to correct unexpected events such as demand fluctuations, failure to share resources etc. Most of previous attempts to establish a vivid market process have hitherto failed, as the underlying economic models were inadequate or incomplete most prototypes lacked intelligent tools for automation. The set-up of an Open Grid Market has even more far-reaching ramifications on the ICT management. Essentially, the Open Grid Market cannot only be used to allocate idle but all available resources. In this case, ownership of resources does no longer play a central role in the resource allocation process, as the cheapest resources that assure the required QoS are allocated to the processes regardless from which organization they come. It is the price, reflecting the demand and supply situation, which determines which resources are accessed. The intelligent tools will help to make the Open Grid Market flexible and transparent enough such that the doubts concerning market-based resource allocation can be resolved. This paper tries to address all problems of an Open Grid Market that deal with economic models and algorithms. It intends to bridge the gap between conceptual strength and real world implementations of Grid platforms. 7.2 Outlook Assuming that Grid markets will find their way into practice, more challenges need to be addressed. Today s competitive markets require companies to flexibly adapt their products and services to meet rapidly changing business strategies and business models at low cost. Accommodating innovative services and products is mainly associated with changes in the business processes and thus with IT systems embedding these processes and in the dimensioning of the IT infrastructure to deliver the services in proper quality. Service orientation in software engineering grants companies the necessary flexibility by orchestrating services into different workflows, thus paving the way for new business processes. The shift towards service orientation also implies a transition from local optimization of the business processes to collaborative and (administratively) distributed business processes. Apparently, Service-Oriented Architectures are the best response to meet the flexibility needs imposed by competition. In this context, Grid technologies complement service orientation to manage the underlying IT infrastructure. As will be demonstrated within the articles, Grid technologies are one way to get a grip on the growing complexity of IT infrastructures which is going along with innovative enterprise applications requiring different hardware, operation systems and application platforms. In addition, Grid technologies open up new possibilities in dimensioning IT infrastructures. While service orientation makes resource demand very difficult to predict, Grid technologies embedded in market-based business models facilitate ondemand assignment of resources via access to a pool of resources. Instead of attributing a fixed number of resources to services supporting specific business processes, Grid technologies dynamically assign resources depending on the actual demand. This is in particular desirable for SMEs, as it trims down the costs for IT infrastructure quite considerably. Accordingly, service orientation and Grid technologies in unison are likely to revolutionize modern enterprises in the future. It is the premise of this book that Open Grid Markets bolster the widespread adoption of Grids. There are, however, many open problems associated with Grids and Grid markets. Currently, Grid markets almost exclusively deal with application-agnostic deployment schemes. Trading CPU hours and memory is insufficient to support the deployment of complex systems such as Enterprise Planning Systems (ERP). If application-independent services 28

33 are traded, the resource allocation problem will become even more complex due to the dependencies between services. For instance, a workflow of service requests may require several services in parallel or in sequential order. All approaches described within this book are devoted to number-crunching activities taking CPU as the scarce good. Taking into consideration that hardware and thus CPU gets way cheaper, it raises the question, whether this is the real problem of Grid computing. The management of storage or more precisely the distributed data management (e.g. replications) raises many questions that could be used by referring to economic principles. In all of the presented models, the impact of bandwidth in combination with the topology was widely neglected. Assuming that bandwidth will become the critical factor there is a strong need to incorporate this limiting factor into the resource allocation decision. At the bottom line, the results of this book suggest several intriguing research avenues: Analyze the properties of the proposed mechanisms for the respective application model classes. Implement the proposed economic models and algorithms and conduct field studies in order to get real data. Develop market mechanisms, where theory is mostly silent (e.g. task-oriented applications). Develop sustainable business models for companies that provide market platforms for trading Grid services or resources. Identify the size and the potential revenue of the single segments of the Grid market Identify limits of the use of market mechanisms in Grid. 29

34 Bibliography Acharya, N., C. Chokkaredd, et al. (2001). The open computation exchange & auctioning network (ocean), Department of Computer & Information Science & Engineering, University of Florida. Alkadi, I. and G. Alkadi (2006). "Grid Computing: The past, now, and future." Human Systems Management 25(3): Andrzejak, A., M. Arlitt, et al. (2002). "Bounding the Resource Savings of Utility Computing Models." Working Paper HPL , Hewlett-Packard Laboratories, Palo Alto, CA. AuYoung, A., B. N. Chun, et al. (2004). Resource Allocation in Federated Distributed Computing Infrastructures. Proceedings of the 1st Workshop on Operating System and Architectural Support for the On-demand IT InfraStructure. Barak, A., A. Shiloh, et al. (2005). An Organizational Grid of Federated MOSIX Clusters. CC Grid, Cardiff, UK. Bell, G. and J. Gray (2002). "What's Next in High-Performance Computing?" Communications of the ACM 45(2): Berstis, V. (2002). "Fundamentals of Grid Computing." IBM Redbooks Paper, Austin, TX. Butler, D. (2006). "Amazon puts network power online." Nature 444: 528. Buyya, R. (1999). High Performance Cluster Computing: Architectures and Systems, Prentice Hall. Buyya, R., D. Abramson, et al. (2000). "Nimrod/G: An architecture of a resource management and scheduling system in a global computational grid." Proceedings of the 4th International Conference on High Performance Computing in Asia-Pacific Region: Buyya, R., D. Abramson, et al. (2004). The Grid Economy. IEEE Grid Computing. Buyya, R., H. Stockinger, et al. (2001). Economic models for management of resources in peer-to-peer and grid computing. International Conference on Commercial Applications for High Performance Computing, Denver, CO. Carr, N. (2003). "IT doesn't matter." Harvard Business Review 81(5): Carr, N. (2005). "The End of Corporate Computing." MIT Sloan Management Review 46(3): Cherkasova, L., D. Gupta, et al. (2007). "When Virtual is Harder than Real: Resource Allocation Challenges in Virtual Machine Based IT Environments." HPL De Fraja, G. and J. Sakovics (2001). "Walras Retrouve: Decentralized Trading Mechanisms and the Competitive Price." Journal of Political Economy 109(4): Eymann, T., O. Ardaiz, et al. (2005). Catallaxy-based Grid Markets. First International Workshop on Smart Grid Technologies (SGT05), Utrecht, Netherlands. Eymann, T., M. Reinicke, et al. (2003). Exploring Decentralized Resource Allocation in Application Layer Networks. Agent Based Simulation. Eymann, T., M. Reinicke, et al. (2003). Self-Organizing Resource Allocation for Autonomic Networks. Workshop on Autonomic Computing Systems (ACS/DEXA 2003), Prague, Cz, IEEE Computer Society. Fellows, W., S. Wallage, et al. (2007). Grid Computing - The state of the market, The451Group. Foster, I. and C. Kesselman (1998). The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann. Garlan, D. and M. Shaw (1993). An Introduction to Software Architecture. Advances in Software Engineering and Knowledge Engineering. V. Ambriola and G.Tortora. New Jersey, World Scientific Publishing Company. I. Gomolski, B. (2003). "Gartner 2003 IT Spending and Staffing Survey Results." Gartner Research, Stamford, Connecticut. Hardin, G. (1968). "The Tragedy of the Commons." Science 162:

35 Irwin, D. E., L. E. Grit, et al. (2004). "Balancing Risk and Reward in a Market-based Task Service." Working Paper. Kersten, G. E., S. Strecker, et al. (2004). Protocols for Electronic Negotiation Systems: Theoretical Foundations and Design Issues. Proceedings of the 15th International Workshop on Database and Expert Systems Applications (DEXA), Zaragosa, Spain. Knight, W. (2006). "Unlocking the Grid." Engineering & Technology 1(3): Kopparapu, C. (2002). Load Balancing Servers, Firewalls, and Caches, John Wiley & Sons. Krishna, V. (2002). Auction Theory. San Diego, CA, Academic Press. Lai, K. (2005). "Markets are Dead, Long Live Markets." Working Paper. Lai, K., B. A. Huberman, et al. (2004). "Tycoon: a Distributed Market-based Resource Allocation System." Working Paper. Lai, K., L. Rasmusson, et al. (2004). "Tycoon: an Implementation of a Distributed, Marketbased Resource Allocation System." Working Paper. LSF. (2005). "LSF." Macías, M., G. Smith, et al. (2007). Enforcing Service Level Agreements using an Economically Enhanced Resource Manager. 1st Workshop on Economic Models and Algorithms for Grid Systems (EMAGS 2007) In conjunction with the 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), Austin, Texas, USA. Marcus, E. and H. Stern (2000). Blueprints for High Availability: Designing Resilient Distributed Systems, John Wiley & Sons. Neumann, D. (2006). Self-Organizing ICT Resource Management - Policy-based Automated Bidding. echallenges e-2006 Conference, Barcelona. Neumann, D. (2008). Engineering Grid Markets. Negotiation and Market Engineering. H. Gimpel, N. R. Jennings, G. Kersten, A. Ockenfels and C. Weinhardt, Springer. Neumann, D., C. Holtmann, et al. (2006). "Grid Economics." Wirtschaftsinformatik 48(3): Neumann, D., J. Stoesser, et al. (2008). "A Framework for Commercial Grids - Economic and Technical Challenges." Journal of Grid Computing: submitted. Nisan, N., S. London, et al. (1998). Globally distributed computation over the internet - the popcorn project. 18th International Conference on Distributed Computing Systems, Amsterdam, The Netherlands, IEEE Computer Society. Padala, P., C. Harrison, et al. (2003). Ocean: The open computation exchange and arbitration network, a market approach to meta computing. International Symposium on Parallel and Distributed Computing. Rappa, M. A. (2004). "The Utility Business Model and the Future of Computing Services." IBM System Journal 43(1): Regev, O. and N. Nisan (1998). The POPCORN market - an online market for computational resources. First international conference on Information and computation economies, Charleston, South Carolina, ACM Press. Rosenschein, J. and G. Zlotkin (1994). Rules of Encounter: Designing Conventions for Automated Negotiation among Computers. Boston, MIT Press. Satterthwaite, M. and A. Shneyerov (2003). "Convergence of a Dynamic Matching and Bargaining Market with Two-sided Incomplete Information to Perfect Competition." Working Paper. Schnizler, B., D. Neumann, et al. (2006). "Trading Grid Services - A Multi-attribute Combinatorial Approach." European Journal of Operation Research: forthcoming. Shneidman, J., C. Ng, et al. (2005). Why Markets Could (But Don t Currently) Solve Resource Allocation Problems in Systems. Proceedings of the 10th conference on Hot Topics in Operating Systems. 31

36 Sullivan, D. G. and M. I. Seltzer (2000). "Isolation with Flexibility: a Resource Management Framework for Central Servers." Proceedings of the USENIX Annual Technical Conference: Sun. (2005). "GridEngine." Sutherland, I. E. (1968). "A futures market in computer time." Communications of the ACM 11(6): Waldspurger, C. A., T. Hogg, et al. (1992). "Spawn: A distributed computational economy." IEEE Transactions on Software Engineering 18(2): Werthimer, D., J. Cobb, et al. (2001). "SETI@HOME-massively distributed computing for SETI." Computing in Science and Engineering 3(1): Wolinsky, A. (1988). "Dynamic Markets with Competitive Bidding." Review of Economic Studies, 55(1): Wolski, R., J. Brevik, et al. (2003). Grid resource allocation and control using computational economies. Grid Computing - Making The Global Infrastructure a Reality, John Wiley & Sons: chapter 32. Wolski, R., J. Brevik, et al. (2004). Grid resource allocation and control using computational economies. Grid Computing - Making The Global Infrastructure a Reality, John Wiley & Sons: chapter 32. Wolski, R., J. Plank, et al. (2001). "Analyzing market-based resource allocation strategies for the computational grid." International Journal of High Performance Computing Applications 15(3): Wolski, R., J. Plank, et al. (2001). G-commerce: Market formulations controlling resource allocation on the computational grid. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). Yeo, C. S. and R. Buyya (2004). "A taxonomy of market-based resource management systems for utility-driven cluster computing." Technical Report, GRIDS-TR-Grid Computing and Distributed Systems Laboratory, University of Melbourne. 32

37 Part B Articles 33

38 Bridging the Adoption Gap Developing a Roadmap for Trading in Grids Dirk Neumann, Jochen Stoesser, Christof Weinhardt Institute of Information Systems and Management (IISM) Universität Karlsruhe (TH) {Neumann Stoesser Weinhardt}@iism.uni-karlsruhe.de Introduction Grid computing is increasingly gaining traction in a number of application areas. It describes a computing model that distributes processing across an administratively and locally dispersed infrastructure. By connecting many heterogeneous computing resources, virtual computer architectures are created, increasing the utilization of otherwise idle resources (Foster et al. 2001). By the use of Grid technology, it is possible to set up virtual supercomputers via the connection of normal servers that cost no more than US$3000 each. Adding and removing servers is simple, granting extreme flexibility in building up infrastructures. The best example of what Grid technology can achieve is illustrated by its prominent predecessor SETI@home: Over one million computers spread across 226 countries were connected to reach a processing power of TFLOPS (Cirne et al. 2006). By comparison, the world s fastest supercomputer, IBM s BlueGene/L, has an estimated total processing power of between 280 and 267 TFLOPS 1, whereas Google's search engine can muster between 126 and 316 TFLOPS estimated total processing power. The business case for Grids is further underlined by potential cost savings. It has been projected that Grids may decrease total IT costs by 30 % (Minoli 2004). Thus, it is not surprising that Insight Research projects an increase in worldwide Grid spending from US$714.9 million in 2005 to approximately US$19.2 billion in 2010 (Anonymous 2006). 1 accessed on 18 September

39 Currently, Grids are mainly employed within enterprises to connect internal divisions and business units. What is needed to extend Grid technologies beyond company borders is a set of mechanisms that can enable users to discover, negotiate, and pay for the use of Grid services on demand. According to The451Group, a leading Grid research institute, the application of resource trading and allocation models is a crucial success factor for establishing commercial Grids (Fellows et al. 2007). Recent Grid offerings by Sun Microsystems and Amazon.com represent a first step in this direction. Sun introduced network.com, which offers computing services for a fixed price of $1 per CPU hour 2, while Amazon is currently launching comparable initiatives with its Amazon Elastic Compute Cloud (Amazon EC2) and the Amazon Simple Storage Service (Amazon S3). Despite these first approaches, electronic marketplaces for Grid resources have not yet taken off. Very few customers are using Sun s network.com. Due to different legal frameworks, network.com was only offered to customers within the US. The adoption patterns in the US resemble those of the rest of the world few customers have adopted Grid markets. 3 But what is the reason for this limited success of Grid markets? Almost every large computer hardware manufacturer like HP, Sun, or Intel has already worked on or at least considered the options for Grid markets, but still no Grid market has successfully been launched, creating a Grid adoption gap. This paper attempts to explain why Grid market initiatives have failed. The explanation mainly focuses on the object traded on Grid markets. The problem with current markets for Grids is that they are designed purely as markets for physical resources. For example, Amazon s Elastic Compute Cloud only aims to sell CPU hours. This type of market is by design not relevant for enterprise customers who have deadlines for executing jobs and who have no idea of how many resources are required to meet this deadline. As a result of the analysis in this paper, it is concluded that a single Grid market for physical resources such as CPU and memory is insufficient to ensure successful take-up of Grid computing across organizational boundaries. Instead, a set or catalogue of different marketplaces is needed to satisfy the diverse needs of different market segments. This paper begins the process of cataloguing the needed market mechanisms. Thus, this paper provides guidance for potential Grid 2 accessed on 18 September Quite recently, Sun has been facilitating access to customers in 25 countries. It remains to be seen if the removal of legal concerns will result in broader adoption. 35

40 market operators (e.g. Telecom companies or hardware vendors such as Sun Microsystems) in the choice of the market mechanisms needed to increase the impact of Grid markets in commercial settings. The remainder of this paper is structured as follows. Section 2 discusses ways to define the trading object in Grid markets and thus forms the background for the discussion of market mechanisms. Section 3 explores whether one marketplace alone is sufficient for meeting the needs of Grid providers and users. A two-tiered market structure is proposed as a viable solution for structuring commercial Grids. Section 4 discusses the use of different market mechanisms for this two-tiered market structure and shows which mechanisms are most adequate for which kind of application. Section 5 concludes the paper with a summary and points to future work. Background Discussion Applications in a Grid environment can be deployed in two ways, either by directly accessing resources that are distributed over the network or by invoking a Grid service 4 which encapsulates the respective resources behind standardized interfaces. These alternative ways of deployment give rise to different requirements for potential markets. From a technical point of view, resources are simple to describe as there exists only a finite set of well-defined resources. A resource may be characterized by its operating system (e.g. Windows, Linux), number and type of CPUs (e.g. 4 * x86), memory (e.g. 128MB RAM), or any finite number of other attributes. The standardization of resources offers a simple way to describe them semantically. Glue Schema 5, for instance, provides a standardized vocabulary for characterizing computing elements and their status. This in turn facilitates resource discovery, as matchmaking within the finite domain space is straightforward. Services on the other hand can be extremely difficult to describe. The service space is essentially infinite due to the myriad of variations in service design. For the description of complex application services (e.g. a virtual web 4 Grid Services are stateful extensions of Web Services ( accessed on 18 September 2007). 5 See accessed on 18 September

41 server service Apache on Linux ), domain-dependent ontologies may become necessary. In the case of resource-near services, henceforth called raw application services (for instance, computational services which use only CPU and memory), standardized languages such as the Job Submission Description Language (JSDL) 6 exist. Nonetheless, the indefinite search space tremendously exacerbates service description and likewise service discovery. Both resources and services are provided on the basis of Quality-of-Service (QoS) assertions, i.e. essentially guarantees of access to specific resources or services at specific times and under specific conditions. For resources, the QoS description is simple, as only standardized properties and the duration of resource access matter. For services, however, QoS is more difficult, as not only time aspects but also precision and accuracy of the services play a role. The definitions of precision, accuracy and further parameters depend on the individual service and cannot be standardized. This also has ramifications for monitoring. While monitoring resource access is relatively simple, the monitoring of complex application services becomes particularly demanding when services are intertwined. Figure 1 shows the different aggregation levels of services and resources with respect to the layers of the Grid Protocol Architecture (Joseph et al. 2004). On top of this stack, the actual Grid application is located, such as a demand forecasting tool for Supply Chain Management or computer-aided engineering tools for simulations, etc. While these applications can be run directly on physical resources, they may access several complex application services, e.g. services for integrating, aggregating and statistically analyzing vast amounts of data, to simplify development and deployment by shielding parts of the Grid s complexity from the application. Complex application services are so diverse that they cannot reasonably be standardized. These complex application services might in turn access raw application services which provide standardized interfaces for accessing various data sources and or computational services. These raw application services are resource-near services (such as storage, memory or computation services). Raw application services also comprise applications and software libraries which can be standardized. These physical resources may be CPUs, memory, sensors, other hardware and software or even aggregated resources such as clusters (e.g. a Condor cluster) and designated computing nodes. 6 See accessed on 18 September

42 Figure 1: Grid Protocol Architecture and the different aggregation levels of services and resources When relying on direct access to physical resources, executables and external libraries need to be transferred across the Grid. Typically, state-of-the-art Grid middleware only supports limited resource management functionality. In most cases, the middleware does not enforce policies concerning how many resources a job can consume. Only the local administrator can specify the degree to which resources can be shared. Trading physical resources is thus difficult to achieve by means of Grid middleware. Trading physical resources, however, is possible on the operating system level, which supports effective resource management. So called Grid Operating Systems, henceforth Grid OS (e.g. MOSIX 7 ), support resource management on the OS Kernel level and are potentially available for setting up markets (Stößer et al. 2007). With Grid OS, applications need not be altered to be run on the Grid, as is the case when Grid middleware is being used. From an economic perspective, Grid resource markets are promising for automation via an organized electronic market. There are standardized items for sale that potentially attract many buyers and sellers. However, complex application services have the disadvantage that demand is highly specialized and distributed across niche markets, such that only few potential buyers are interested in the same or related application services. One Market Fits All? In this section, we explore whether it is sufficient to build up and operate a single market for Grids. Based on the previous background description, the answer to this question is straightforward: Designing one Grid market for 7 See accessed on 18 September

43 all kinds of resources, from physical resources, such as processing power, memory and storage running on native platforms, to sophisticated virtual resources or application services that bundle and enrich such physical resources, seems inappropriate due to both technical and economical factors. a) Technical factors From the technical point of view, differences in the monitoring and deployment of services and resources make it very difficult to devise a generic system capable of supporting all kinds of resource and service trading. Furthermore, different deployment mechanisms impose different requirements on the market mechanism, as we will outline below. b) Economic factors From the economic point of view, market mechanisms need to achieve the following standard objectives in mechanism design (Stößer et al. 2007): Allocative efficiency: Allocative efficiency is the overall goal of market mechanisms for Grid resource allocation. A mechanism is allocatively efficient if it maximizes the utility across all participating users (welfare or overall happiness ). Budget-balance: A mechanism is budget-balanced if it does not need to be subsidized by outside payments. Computational tractability: The market mechanism needs to be computed in polynomial run time in the number of resource requests and offers. Truthfulness: Truthfulness means that it is a (weakly) dominant strategy for users to reveal their true valuations to the mechanism. Individual rationality: A mechanism is individually rational if users cannot suffer a loss in utility from participating in the mechanism, i.e. if it is individually rational to participate. As mentioned in the background discussion, trading physical resources and trading application services impose completely different requirements on the market: while physical resources are more or less commodities for which auction mechanisms seem to work well, (complex) application services are inherently non-standardized, thus making auction-like mechanisms inapplicable. 39

44 From this brief discussion, it can be seen that a one-size-fits-all market for Grids is infeasible from both the technical as well as from the economic perspective. Due to the heterogeneous properties of Grids, they can be divided into two different types of markets: resource-near markets for physical resources and raw application services on the one hand, and markets for complex application services on the other hand, spanning out what may be called a two-tiered Grid market structure 8 as depicted in Figure 2. These two classes of markets are analyzed below in terms of their requirements for market structure 9. Market for Complex Application Services Application Complex Application Service Market for Raw Application Services Raw Application Service Application Complex Application Service Raw Application Service Application Application Complex Application Service Complex Application Service Physical Resource Physical Resource Grid Users Market for Physical Resources Integrators Resource Providers Tier 2 Tier 1 Tier 1 Markets Figure 2: Two-tiered market structure In the resource-near market for physical resources and raw application services, low-level resources such as processing power, memory, and storage are traded. Demand in this market is generated by complex application services that need to be executed on these physical resources and raw application services. This setting poses 8 We use the term market structure to denote the configuration of marketplaces. 9 It should be noted that there could be n-intermediate markets. We consider only the extreme cases, as they exhibit different characteristics. 40

45 special requirements for market mechanisms (Schnizler et al. forthcoming). However, as mentioned earlier, the requirements also depend on the way resources are deployed. Deployment as Physical Resource The technical requirements in markets for physical resources are the following: Multi-attributes: Physical resources have quality attributes such as CPU speed, operating platform or bandwidth. Thus the mechanism needs to cope with multiple attributes simultaneously. Bids on bundles: Generally, users require a combination of physical resources to execute an application (e.g. CPU and memory). If the mechanism does not account for bids on bundles, the user faces the risk of obtaining only one part of a bundle (the so called exposure risk ). The market mechanism thus needs to support requests for bundles of resources. Online mechanism: The allocation of the mechanism needs to be made instantaneously, as the market assumes the role of an operating system scheduler. The mechanism thus needs to be lightweight such that it requires little computation time. Online mechanisms are crucial in the case where information such as release time or request processing time is only gradually released to the scheduler.. The scheduler must be able to mitigate past decisions that prove unfortunate when new information enters the system. For example, facing a decrease in the performance of an application, the mechanism may be required to allocate additional physical resources in a timely manner. Split-proofness: In some scenarios, one might want the mechanism to treat small and large requests in a fair manner. The mechanism may need to be split-proof in the sense that users cannot improve their requests priority by splitting them into multiple parts and submitting them under different aliases. Merge-proofness: The mechanism must be stable in the face of strategic users, who build coalitions to improve their priority. Thus the mechanism needs to assure that users do not have an advantage by merging their requests. Deployment as Raw Application Service The technical requirements in markets for raw application services are the following: 41

46 Multi-attributes: A virtual machine, for instance, may be characterized by the number of CPUs, share of memory, cache and bandwidth of the underlying physical resource. A simple computational service may need to provide a certain speed and accuracy. Bids on bundles: A user program may require multiple interdependent raw application services in parallel. Time constraints: When raw application services are traded, the market mechanism needs to take time attributes into account. The requesters need to specify their demand, so that the market mechanism can efficiently schedule the requests according to the availability of resources and price. This situation differs from the trading of resources, where the market mechanism executes requests and the corresponding applications upon availability. Co-allocation: Capacity-demanding Grid applications usually require the simultaneous allocation of several homogenous service instances from different providers. For example, a large-scale simulation may require several computation services to be completed at the same time. This is often referred to as co-allocation. A mechanism for raw application services has to enable and control co-allocations. Coupling: For some applications, it may be necessary to couple multiple raw application services into a bundle in order to guarantee that the services are allocated from the same seller and will be executed on the same machine. Resource isolation: Security and performance considerations lead to the requirement of resource isolation, i.e. that a specific raw application service can only be instantiated once at any given time. Tier 2 Markets Along the lines of the two-tiered market structure, complex application services can be decomposed into several raw application services that can in turn be translated into the physical resources required to execute these services. For instance, some complex application service might require a basic XML transformer service, which in turn needs processing power, memory, storage etc. Buyers in such a market request a complex application service; the provider of this complex application service, the service integrator, is responsible for obtaining the required raw application services and physical resources in turn, thus hiding parts of the Grid s complexity from the buyer. Such a hierarchical masking of complexity seems to be an appropriate approach since service 42

47 requesters typically have no insight into the resources the complex application service will consume (Eymann et al. 2006). The requirements of complex application services for market mechanisms are totally different than for resourcenear markets. Complex application services are rarely used by two different companies; hence creating competition via an auction mechanism does not make sense. Instead, the market for complex application services faces the difficulty of finding a counterpart that offers the exact capabilities needed to execute the application. As the following requirements suggest, the market mechanism is more search-oriented, in terms of the need for bilateral or multi-lateral negotiation protocols: Multi-attributes (see above). Workflow support: To support complex services, distributed resources such as computational devices, data, and applications need to be orchestrated while managing the application workflow within Grid environments. The market mechanism needs to account for this during design time and run time of the workflow. Scalability: Scalability considers how the properties of a protocol change, as the size of a system (i.e. the participants in the Grid) increases. Co-allocation: (see above). As pointed out in the introduction, resource-near Grid markets are not a viable solution for enterprises which typically have to run time-critical applications. However, applications in academia are usually less time dependent. As such, resource-near markets would be a viable business model for such settings; the users have to wait until the queued jobs are executed. But clearly, the issue of payment is controversial in academia. Even for the EGEE Grid 10, billing and payment will soon become an issue as demand exceeds supply. It seems that resource-near markets will soon become an adequate model for academic Grids such as EGEE or D-Grid, the German Grid initiative. Grid markets that will be widely accessed by enterprises need to be of the form of markets for complex application services. Consider a manufacturer interested in executing a computer-aided engineering application 10 Enabling Grids for escience, accessed on 18 September

48 and deploying it on a computationally intense platform. This complex application service showcases a very specific service which is likely to be demanded by a single requester only. To accommodate this complex application service, the service must be decomposed into its constituent raw application services and the required physical resources. Integrators companies such as EDO2 11, GigaSpaces 12, etc. that specialize in aggregating and disaggregating services into physical resources are needed to facilitate this decomposition process. The specialization stems from experience, allowing the identification of service needs by comparing each service request with similar service requests in the past, where similarity is established in terms of algorithms, data structures and sizes, etc. Telecommunication companies and hardware producers seeking to virtualize IT infrastructures naturally have the interest and competency to become Grid services integrators. Table 1 summarizes this discussion. Market level Tier 1-market (resource-near) Tier 2-market Deployment Physical Resource Raw Application Service Complex Application Service Description languages GLUE JSDL/RSL Domain-dependent ontologies Time limits No Yes Yes Application areas Small enterprises, Integrators and resource All enterprises academia providers Requirements Multi-attributes, Bids on Multi-attributes, Bids on Multi-attributes, bundles, Online bundles, Time attributes, Co- Workflow support, mechanism, Split-proof, Allocation, Coupling, Resource Scalability, Merge-proof Isolation Co-Allocation A Roadmap for Grids Table 1: Types of grid markets In previous sections, we argued that Grids do not require something like a single global market where all Grid requests and supplies are aggregated, but a more complex two-tiered market structure. Figure 2 above summarizes this (meta-)market structure. Applications demand the execution of several complex application services in Tier 2 markets. In resource-near markets (Tier 1), complex services request physical resources either 11 accessed on 18 September accessed on 18 September

49 plainly deployed or accessed via service interfaces. Integrators assume the responsibility for mediating between requesters unaware of their resource needs and the needed resources. Based on the preceding discussion, we set up a taxonomy below of known market mechanisms that support different types of Grid applications. This taxonomy is conceived as a roadmap for further Grid market developments to help bridge the Grid adoption gap. Market Mechanisms for Trading Physical Resources For Grids where physical resources (but not services) are traded, no time restrictions apply. Usually, the resources themselves are not traded, but rather shares of computing units (e.g. nodes) are. The idea is that bidding determines the share a user receives from the pool of available nodes. The more resources a user obtains, the faster the application is completed. The following mechanisms have been proposed for this setting: Fair Share (Kay and Lauder 1988): In the Fair Share mechanism, all users get the same share of the respective resource. This share is dynamically adjusted as new users enter the system or existing users leave. Proportional Share (Lai et al. 2004): In contrast to the Fair Share mechanism, with Proportional Share the shares can differ across users to reflect each user s priority. This priority may be determined dynamically based on the users bids for resources: If there are n users in the system and user i has submitted a valuation of v i, then i will receive a share amounting to v i / j = v 1 j. n Pay-as-bid (Sanghavi and Hajek 2005; Bodenbenner et al. 2007): The pay-as-bid mechanism operates in the same settings as Fair Share and Proportional Share but is specifically designed so as to induce users to truthful reporting of valuations. It has been shown that pay-as-bid improves on the prominent Proportional Share mechanism as regards efficiency and provider s revenue (Bodenbenner et al. 2007). All three mechanisms, however, share a common drawback: they can only be used in scenarios where one resource provider serves several consumers, that is, there is no competition among providers. In cases where all resources are under fully centralized control, this condition is unproblematic. But the idea in Grids is to cross administrative boundaries. Consequently, there is a need for market mechanisms that support multiple resource providers. Market Mechanisms for Trading Application Services 45

50 Dividing the service market into two parts raw and complex application services is too simplistic, as the timing of demand for services has not yet been considered. This timing is determined by the application itself and depends on the task the application is performing. We use the term application model as a characterization of the processing mode of the application. This encompasses in particular the workload of the application as well as the interaction model between applications and the Grid middleware virtualizing the execution platform. Depending on the application model, different requirements upon the market mechanisms emerge. Batch applications (e.g. data mining) are characterized by a planned execution and expected termination time. Execution is serial and resource demands depend on the parameters, such as the size of input data. Interactive applications (e.g. online data analysis) are applications that require services on demand, depending on the interactions with users. In contrast with batch applications, it is not possible to plan the execution and expected termination time of interactive applications far in advance, so unpredictable peaks of requests can occur within a short time. Task-oriented applications are dynamically composed from (sub-)tasks to build more complex tasks. Service demand depends on the (work-) flow of requests from multiple users (e.g. transaction system of a bank constitutes). Most of the market-based approaches relate to batch applications. Batch applications are comparably easy for two main reasons. First, there is no need to consider a whole workflow with different resource demands on each echelon of the workflow. Second, the time to determine the allocation can be relatively long; immediacy is not essential. Thus, complex resource allocation computations can be performed without hampering the whole application due to latency times devoted to the calculation of the optimal allocation. Furthermore, most of the practical market-based Grid prototypes consider only one single resource type (e.g. CPU only) and thus make use of standard auctions (e.g. English auction). In applications other than pure number crunching, those auction types are inadequate as more than one object (e.g. memory) is required at the same time. Market Mechanisms for Raw Application Services As mentioned above, the market mechanisms for raw services depend on the application model. Mechanisms that we identify as being adequate for batch applications are: 46

51 Multi-attribute Combinatorial Auction (Bapna et al. 2006): In the model of Bapna et al., multiple requesters and providers can trade both computing power and memory for a sequence of time slots. First, an exact mechanism is introduced. By imposing fairness constraints and a common valuation across all resource providers, the search space in the underlying combinatorial allocation problem is structured so as to establish one common, truthful price per time slot for all accepted requests. Additionally, this introduces a linear ordering across all jobs and time which reduces the complexity of the allocation problem, which however still remains NP-hard 13. To mitigate this computational complexity, a fast, greedy heuristic is proposed at the expense of both truthfulness and efficiency. Multi-attribute Exchange (Stößer et al. 2007): The model of Stößer et al. extends the model of Bapna et al. (2006) in that it accounts for strategic behavior not only on the demand side of the market but also among resource providers. Stoesser et al. design a truthful scheduling heuristic for Grid operating systems that achieves an approximately efficient allocation fast. A greedy heuristic is employed to solve the scheduling problem. It is complemented with an adequate pricing scheme which assures that reporting truthfully is a dominant strategy for resource requesters and payments to resource providers are approximately truthful. MACE-mechanism (Schnizler et al. forthcoming): Schnizler et al. elaborate a Multi-Attribute Combinatorial Exchange (MACE). Users are allowed to request and offer arbitrary bundles of computer resources and can specify quality attributes of these resources. The scheduling problem in this combinatorial setting is NP-hard, and the pricing scheme of MACE yields approximately truthful prices on both sides of the market. Due to the NP-hardness of the problem, the mechanism is not applicable to large scale settings and interactive applications that require the immediate allocation of resources. Combinatorial Scheduling Exchange (AuYoung et al. 2004): The Bellagio system also implements an exact combinatorial exchange for computing resources (AuYoung et al. 2004). Its pricing is based on the approximation of the truthful Vickrey-Clarke-Groves prices proposed by Parkes et al. (2001). As an exact mechanism it shares the computational drawbacks of MACE. Augmented Proportional Share (Stoica et al. 1996): One major drawback of Proportional Share as proposed by Lai et al. (2004) is that users do not get any QoS guarantee. Generally, users will receive a 13 Informally, for a complex / large problem instance, the allocation problem is not solvable with justifiable 47

52 larger or smaller share than required and constantly need to monitor their resource needs and the actual allocation. To mitigate this deficiency, Stoica et al. propose an extension of the Proportional Share mechanism so that users can request and receive a fixed share of the resource. Thus, this approach inherits the benefits of the simplicity of Proportional Share and resource reservation if required. For interactive applications it is impossible to predict demand for raw application services. Thus, market mechanisms need to allocate these services continuously. This can be realized by frequent call mechanisms, where bids are collected for a very short time span and right away cleared. This requires that the mechanism is solvable in few milliseconds. The greedy approaches of the Multi-attribute Combinatorial Auction Heuristic (Bapna et al. 2006) and the Multi-attribute Exchange (Stößer et al. 2007) are capable of approximately solving the combinatorial allocation problem in the required timely manner. Alternatively, the mechanism could be a true online mechanism, which allows the real-time job submission to available resources (e.g. nodes or clusters). A third way is to introduce a derivative market to hedge against the risk of supernormal resource demand. Adequate market-based schedulers for interactive applications comprise: Decentralized Local Greedy Mechanism (Heydenreich et al. 2006; Stößer et al. 2007): This mechanism is designed to provide a scalable and stable auctioning protocol. It operates in an online setting where jobs arrive over time. Its aim is to schedule these jobs so as to minimize the overall weighted completion times. This scheduling mechanism is complemented by a pricing rule which induces the users to truthful behavior. Augmented Proportional Share (Stoica et al. 1996): The shares which are allocated to the users and the resulting prices can be adjusted very efficiently as new requests are submitted or requests are finished. Moreover, users can choose their risk by either fixing the price and receiving a corresponding share which may or may not correspond to the true demand of their interactive application or by fixing their share and having to pay the corresponding price, which may or may not match their valuation. Derivative Markets (Kenyon and Cheliotis 2002; Rasmusson 2002): Derivative markets perform two basic functions, hedging and speculation, which become essential in the context of non-storable goods such as Grid resources. This generates early price signals which support capacity planning for both buyers and sellers. Rasmusson (2002) proposes the use of options to price network resources. Kenyon effort. 48

53 and Cheliotis (2002) follow the same idea by proposing option contracts for network commodities such as bandwidth. They elaborate a pricing scheme for options which specifically takes into account aspects of the underlying network topology that may affect prices, e.g. the existence of alternative paths. Options cannot only be used for hedging purposes but also for speculation (arbitrage) which contributes liquidity to the market. The requirements for market mechanisms that support task-oriented applications are very demanding, as all constituents of the workflow need to be allocated simultaneously, since otherwise the application cannot be fully executed and is of no value to the user. Currently, there are only bargaining protocols available that guide the users in their search for all components of the workflow, e.g.: SNAP (Czajkowski et al. 2002): The Service Negotiation and Acquisition Protocol (SNAP) supports the reliable management of Service Level Agreements (SLAs). In particular, SNAP supports the process of negotiating simultaneously across multiple resource providers, a key requirement for complex and resource-demanding services. Market Mechanisms for Complex Application Services Trading complex application services is very demanding as there are not many providers and requesters. Currently, there is not much research available that aims to develop market mechanisms for trading complex application services. Hence, the market mechanisms for batch, interactive and task-oriented applications do not differ substantially. In particular, the MACE-mechanism (Schnizler et al. forthcoming), the Bargaining Protocol (Czajkowski et al. 2002), and the Decentralized Local Greedy Mechanism (Heydenreich et al. 2006; Stößer et al. 2007) might be appropriate for trading such services. Furthermore, for all applications, the licensing model of software-as-a-service 14 is applicable. It refers to take-it-or-leave-it pricing, where the vendor sets the price and the users decide whether or not to purchase. Table 2 summarizes the existing market mechanisms for each application model. Application model Physical Resource Raw Application Service Complex Application Service 14 Software-as-a-service refers more to model of software delivery where a service provider (e.g. SAP) offers to requesters applications that are specifically implemented for one-to-many hosting. 49

54 Batch MACE Multi-attribute Combinatorial Auction Augmented Proportional Share MACE SNAP Software-as-a-service Interactive Fair Share Proportional Share Sanghavi-Hajek Decentralized Local Greedy Mechanism Multi-attribute Auction Augmented Proportional Share Derivative Markets Decentralized Local Greedy Mechanism SNAP Software-as-a-service Task-oriented SNAP SNAP Software-as-a-service Table 2: Market Mechanisms Canon for Grids Conclusion This paper argues that, despite its promising features, the technology of Grid computing has not yet been adopted by enterprises due to an absence of adequate business models. Grid markets may help to overcome this obvious adoption gap. This paper attempts to derive an economically sound set of market mechanisms based on a solid understanding of the technical possibilities. In the background section, we analyzed the characteristics of the object traded in Grids. This trading object is closely associated with the deployment of software applications. The deployment directly on physical resources or via raw application services has major ramifications for the trading object and consequently for the requirements for market mechanisms. Physical resources are essentially commodities, whereas application services can be both standardized commodities (raw application services) and unique entities (complex application services). Based on this analysis, we derived a two-tiered market structure along the distinction between physical resources and application services, where each tier demands different market mechanisms. The first tier comprises the markets for physical resources (e.g. CPU, memory) and raw application services. The second tier comprises the markets for complex application services. 50

55 We then presented and classified existing Grid market mechanisms according to this market structure. At the core of this paper, we argue that there is no single market that satisfies all purposes. Reflecting the distinct requirements of different application models (physical resource vs. raw application service vs. complex application service) and deployment modes (batch vs. interactive vs. task-oriented), a catalogue of co-existing market mechanisms is needed. Thus, in order to overcome the adoption gap, Grid market operators such as Sun Microsystems and Amazon must deploy multiple market mechanisms via a Grid market platform to satisfy the needs of their enterprise customers, or else integrators must step in to bridge these niche markets. In essence, this paper suggests several intriguing research avenues. Further research is needed to specify the properties of both the various application scenarios and the needed market mechanisms. Since existing market mechanisms represent rather theoretical constructs, they need to be deployed in real-world settings to verify the underlying models and analytic results. However, existing mechanisms only represent a first step towards filling the identified market structure and satisfying the various requirements. This basic catalogue needs to be extended by means of improved and new mechanisms in order to further enhance efficiency and to account for the strategic dimension inherent in Grid markets. And, of course, the commercial adoption and adaption of Grid markets requires an analysis of the supply side as well as of the demand side, which was our focus in this paper. An in-depth analysis of the supply side would address questions such as the following: What are sustainable business models for companies that provide market platforms for trading Grid services or resources? And how large is each of the individual markets within the two-tiered market structure? Chargeable Grid services and sustainable business models are not the final or only answers to the Grid adoption problem. Other factors, such as trust in the security and reliability of Grid technology, are also relevant. But better technology alone will not ensure widespread adoption of the Grid: sound market mechanisms are also necessary. 51

56 References List Anonymous (2006). Grid Computing: A Vertical Market Perspective Boonton, NJ, USA, The Insight Research Corporation. AuYoung, A., Chun, B. N., Snoeren, A. C. and Vahdat, A. (2004) 'Resource Allocation in Federated Distributed Computing Infrastructures', Proceedings of the 1st Workshop on Operating System and Architectural Support for the On-demand IT InfraStructure. Bapna, R., Das, S., Garfinkel, R. and Stallaert, J. (2006) 'A market design for grid computing', INFORMS Journal of Computing forthcoming. Bodenbenner, P., Stößer, J. and Neumann, D. (2007) 'A Pay-as-Bid Mechanism for Pricing Utility Computing ', 20th Bled econference, Merging and Emerging Technologies, Processes, and Institutions, Bled, Slovenia. Cirne, W., Brasileiro, F., Andrade, N., Costa, L. B., Andrade, A., Novaes, R. and Mowbray, M. (2006) 'Labs of the World, Unite!!!' Journal of Grid Computing 4(3): Czajkowski, K., Foster, I., Kesselman, C., Sander, V. and Tuecke, S. (2002) 'SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems', 8th Workshop on Job Scheduling Strategies for Parallel Processing, Edinburgh, Scotland. Eymann, T., Neumann, D., Reinicke, M., Schnizler, B., Streitberger, W. and Veit, D. (2006) 'On the Design of a Two-Tiered Grid Market Structure ', Business Applications of P2P and Grid Computing, MKWI Fellows, W., Wallage, S. and Davis, J. (2007). Grid Computing - The state of the market, The451Group. Foster, I., Kesselman, C. and Tuecke, S. (2001) 'The Anatomy of the Grid - Enabling Scalable Virtual Organizations', Euro-Par 2001 Parallel Processing, Springer. Heydenreich, B., Müller, R. and Uetz., M. (2006) 'Decentralization and mechanism design for online machine scheduling.' in L. Arge and R. Freivalds (eds.) SWAT, Springer, Kay, J. and Lauder, P. (1988) 'A Fair Share Scheduler', Communications of the ACM 31(1): Kenyon, C. and Cheliotis, G. (2002) 'Forward Price Dynamics and Option Design for Network Commodities', Bachelier Finance Society 2nd World Conference, Crete. 52

57 Lai, K., Rasmusson, L., Adar, E., Sorkin, S., Zhang, L. and Huberman, B. A. (2004) 'Tycoon: an Implementation of a Distributed, Market-based Resource Allocation System', Working Paper. Minoli, D. (2004) A Networking Approach to Grid Computing, New York, John Wiley & Sons, Inc. Rasmusson, L. (2002) Network capacity sharing with QoS as a financial derivative pricing problem: algorithms and network design, Ph.D. thesis, Royal Institute of Technology, Stockholm. Sanghavi, S. and Hajek, B. (2005) 'A new mechanism for the free-rider problem', ACM SIGCOMM Workshop Peer-to-Peer Economics, Philadelphia, PA. Schnizler, B., Neumann, D., Veit, D. and Weinhardt, C. (forthcoming) 'Trading Grid Services A Multiattribute Combinatorial Approach', European Journal of Operational Research. Stoica, I., Abdel-Wahab, H., Jeffay, K., Baruah, S., G., J. and Plaxton, G. C. (1996) 'A Proportional Share Resource Allocation Algorithm for Real-Time, Time-Shared Systems', IEEE Real-Time Systems Symposium. Stößer, J., Neumann, D. and Anandasivam, A. (2007) 'A Truthful Heuristic for Efficient Scheduling in Network- Centric Grid OS', European Conference on Information Systems (ECIS 2007), St. Gallen. Stößer, J., Roessle, C. and Neumann, D. (2007) 'Decentralized Online Resource Allocation for Dynamic Web Service Applications ', IEEE Joint Conference on E-Commerce Technology (CEC'07) and Enterprise Computing, E-Commerce and E-Services (EEE '07), Tokyo, Japan. 53

58 A TRUTHFUL HEURISTIC FOR EFFICIENT SCHEDULING IN NETWORK-CENTRIC GRID OS Jochen Stoesser, Institute of Information Systems and Management (IISM), Universität Karlsruhe (TH), Englerstr. 14, Karlsruhe, Germany, Dirk Neumann, Institute of Information Systems and Management (IISM), Universität Karlsruhe (TH), Englerstr. 14, Karlsruhe, Germany, Arun Anandasivam, Institute of Information Systems and Management (IISM), Universität Karlsruhe (TH), Englerstr. 14, Karlsruhe, Germany, Abstract The Grid is a promising concept to solve the dilemma of increasingly complex and demanding applications being confronted with the need for a more efficient and flexible use of existing resources. Network-centric Grid Operating Systems (OS) aim at providing users and applications with transparent and seamless access to heterogeneous Grid resources across different administrative domains. Scheduling in these heterogeneous environments becomes the key issue since scarce resources must be distributed between strategic and self-interested users. Market-based algorithms are deemed to provide a good fit to the Grid environment by accounting for its distinct properties and the needs of the users. Current market mechanisms, however, leave room for inefficiency and are computationally intractable. The speed of heuristics becomes a desirable feature in interactive, largescale Grid OS settings. Heuristics achieve a fast, approximately efficient allocation of resources but generally fail in preserving truthfulness, i.e. users can benefit from cheating the mechanism. The contribution of this paper is the proposal of a greedy, market-based scheduling heuristic for networkcentric Grid OS which does achieve this distinct trade-off: it is designed so as to obtain an approximately efficient allocation schedule at low computational cost while accounting for strategic, self-interested users in a heterogeneous environment. Keywords: network-centric Grid OS, market-based scheduling, heuristic, truthful

59 1 INTRODUCTION Grids allow the aggregation and sharing of a wide variety of geographically dispersed computer resources owned by different administrative units (Foster and Kesselman 2004). Grids are typically used for scientific applications that require massive amounts of computational power and storage such as protein folding, weather forecasts, or gene encoding. These applications potentially process and store terabytes of data and the sharing of resources is necessary to reduce the enormous processing time (Berlich et al. 2005; Cirne et al. 2006). The most famous predecessor of modern Grids is SETI@home which connected in its peak time over one million computers spread across 226 countries to achieve processing power of TFLOPS (Cirne et al. 2006). For comparison, the world s fastest supercomputer IBM s BlueGene/L system has an estimated total processing power of 280 TFLOPS, while Google's search engine system can muster between 126 and 316 TFLOPS. Grids offer enormous potential by aggregating idle resources that are geographically distributed. For businesses, it is projected that Grids may reduce total IT costs by 30 % (Minoli 2004). The use of market mechanisms has often been suggested in situations that establish a price for resources. The price being paid to resource owners that share idle resources reflects the scarcity of the provided resource. Since consuming resources is not for free, demand will be restricted and shifted to those time slots where the resources are really needed. The employment of market mechanisms in distributed computing is not new. The first attempt was reported at Harvard University where auctions were used to allocate computing time to users of the PDP-1 computer (Sutherland 1968). Other examples are Tycoon, Bellagio, and Nimrod/G (Buyya et al. 2000; Lai et al. 2004). From an economic point of view, the use of market mechanisms in Grid computing is promising, but most of the potential in state-of-the-art Grid middleware such as Globus, UNICORE, and glite is being wasted. Current Grid middleware is based on the premise to share idle resources only and computational jobs are submitted to these idle resources where they are being processed in batch mode. It would be more efficient to allocate all available resources over markets. Ideally, resource allocation should be on-demand such that interactive processing of jobs is permitted as well (Gorlatch and Müller 2005). Recently, a new stream of research on network-centric Grid operating systems (OS) has emerged in Grid computing that especially addresses interactive applications. Network-centric Grid OS provide the user with seamless and transparent access to Grid resources such as memory and computing power. While current OS provide only limited support for Grid computing, network-centric Grid OS adjust their view on the system at run-time contingent upon the particular resources. By means of network-centric Grid OS, the Grid can be used in an interactive manner. This interactivity certainly raises problems concerning bandwidth and latency network-centric-grid OS are an active research topic (Padala and Wilson 2003; Gorlatch and Müller 2005). Nonetheless, there are currently some implementations available that come close to this notion of a network-centric Grid OS. MOSIX, for example, is realized as an OS virtualization layer that provides a single system image with the Linux run time environment to users and applications. As such, it allows applications to run on remote nodes as if they ran locally. Users can run their applications while MOSIX is automatically and transparently seeking resources and migrating processes among nodes in order to improve overall performance (Barak et al. 2005). The concept of network-centric Grid OS coupled with markets is deemed promising to revolutionize resource allocation in Grids. From their conception, network-centric Grid OS do not only allocate resources being idle but all available resources. The employment of markets for allocating and scheduling jobs to resources appears promising to achieve an efficient resource allocation in the economic sense allocating those jobs to the resources that value it most. The problem of scheduling resources is, however, computationally demanding. To be harnessed for network-centric Grid OS, market mechanisms needs to be solvable quickly. Currently, there is no market mechanism available that is specifically tailored towards the needs of a network-centric Grid

60 OS. This paper seeks to remedy this gap by designing a truthful scheduling heuristic for MOSIX that achieves an approximately efficient allocation fast. The contributions of this paper are threefold: Firstly, the paper designs a multi-attribute exchange that can be used for Grid OS. Secondly, a greedy heuristic is employed to solve the scheduling problem. Thirdly, an adequate pricing scheme is developed which assures that reporting truthfully is a dominant strategy for resource requesters and payments to resource providers are approximately truthful. The remainder of this paper is structured as follows. In Section 2, the market-based scheduling problem of network-centric Grid OS systems is introduced. Section 3 suggests (i) a heuristic to solve this problem and (ii) a pricing scheme that provides the market mechanism with desirable properties. Section 4 discusses related work in the light of the presented market mechanisms. Section 5 concludes the paper with a summary and points to future work. 2 MARKET-BASED SCHEDULING IN GRID OS There are two classes of system users in network-centric Grid OS: Resource requesters who want to use the computing resources offered in the system for solving a computational problem and resource providers who want to sell idle resources. We introduce the term job to refer to a computational problem and we will call an atomic bundle of resources on which the job or parts of it can be computed a node, e.g. one server within a cluster. If a job gets allocated to a node, we will call this job (node) a winning job (node). We intend to design a direct, sealed-bid mechanism which allocates periodically: the mechanism collects resource requests and offers from the users for a period of time and then allocates jobs to nodes on the basis of these submitted requests and offers. Users do not get to know the requests and offers submitted by other users before the allocation is determined by the mechanism. 2.1 Economic Requirements The mechanism is intended to perform job scheduling in a distributed computing environment with heterogeneous and selfish users. There are a number of economic requirements which a market-based scheduling mechanism should ideally satisfy in this environment (Schnizler et al. forthcoming): Allocative efficiency: A mechanism is said to be allocatively efficient if it maximizes the utility across all participating users (welfare), i.e. the sum over the valuations of all winning resource requesters less the sum over the reservation prices of all winning resource providers. Budget-balance: the mechanism does not need to be subsidized by outside payments. The payments coming from the resource requesters cover the payments made to the resource providers. Computational tractability: the mechanism can be computed in polynomial run time in the size of its input, i.e. the number of resource requests and offers. Truthfulness: it is a (weakly) dominant strategy for users to reveal their true valuations to the mechanism. Individual rationality: users cannot suffer any loss in utility from participating in the mechanism, i.e. it is individually rational to participate. Budget-balance and individual rationality are hard constraints which the mechanism must suffice. If these two requirements are not met, the mechanism will not be sustainable. Truthfulness is a desirable feature since it tremendously simplifies the strategy space of the users; there is no need to reason about the strategies of other users. Due to the celebrated impossibility result of Myerson and Satterthwaite (1983), this inherently implies that allocative efficiency can only be approximated: There is no exchange which is at the same time budget-balanced, individually rational, truthful and allocatively efficient in equilibrium

61 There are three basic components to be designed: the bidding language which defines how requests and offers are specified, the allocation algorithm which decides which job is to be executed on which node at what time, and the pricing scheme which translates the resulting allocation schedule in corresponding monetary transfers. In this section, the bidding language and the winner determination problem will be formalized. The complexity of this scheduling problem forms the rationale for a market-based heuristic. The pricing scheme will only be specified for the heuristic due to the focus of this paper. 2.2 Bidding Language There is on-going research on applying machine learning, regression and filtering techniques for predicting run times of future jobs (Ali et al. 2004). In many cases a certain resource requester will submit jobs to a distributed computing environment which are similar in terms of algorithms, data structures, job size etc. Thus the run time of a new job is likely to follow the run times of similar jobs in the past. Building on this research we will assume that the resources needed for the computation of a job are known to the requester a priori. This comprises the amount of computing power, memory, and the run time of the job. A resource requester wanting to compute a job j submits a request (b j, c j, m j, s j, e j ) to the market mechanism where b j R + denotes the requester s willingness to pay per unit of computing power and time slot, c j ℵand m j ℵ the minimum amount of computing power and memory, s j ℵ the time slot at which the job needs to be started, and e j ℵ the time slot until which the job needs to be run. A job can only be executed in its entireness (atomicity). It cannot be parallelized on multiple nodes but migrate across different nodes over time. Note that job migration is one of the main features which distinguish Grid OS in general and MOSIX in particular from traditional machine scheduling domains. A resource offer containing node n consists of a tuple (r n, c n, m n, ε n, λ n ) where r n R + denotes the reservation price per unit of computing power and time slot, c n ℵ and m n ℵ the maximum amount of computing power and memory, and ε n ℵ and λ n ℵ the earliest and the latest time slot at which the resources are available. Contrary to atomic resource requests, resources offered by some node are divisible and moreover freely disposable. Example: The following resource requests and offers have been submitted to the system: Job j b j c j m j s j e j Node n r n c n m n ε n λ n J N J N J N J J J J Table 1. Sample resource requests and offers. Job J1 requests to be run in time slots 2 and 3 and requires at least 60 units of computing power and 50 units of memory in each time slot. The resource requester for job J1 is willing to pay up to $11 per unit of computing power and time slot, i.e. $11 * 60 * 2 = $1,320 in total. The provider of node N1 offers at most 150 units of computing power and 100 units of memory in time slots 1 to 5 and requires a minimum payment of at least $5 per unit of computing power per time slot. To simplify notation, in the following we will assume that each request and each offer has been submitted by a separate user and we will use the terms request and job (offer and node) interchangeably. In contrast to Schnizler et al. (forthcoming), this bidding language does not support a combinatorial exchange where users can request and offer arbitrary sets of goods and possibly link multiple requests and offers by means of logical operators. In the scenario at hand, each request and offer specifies one and the same bundle of computing resources and memory and thus rather implements a multi-attribute exchange (Bichler et al. 1999) by allowing the specification of multi-unit bundles, e.g. 100 units of computing power and 50 units of memory. Note, however, that while the bidding language is not combinatorial, the scheduling problem of allocating jobs to nodes remains a combinatorial assignment problem as will be seen below

62 2.3 Exact Scheduling Exact mechanisms such as the well-known Vickrey-Clarke-Groves (VCG) mechanism are guaranteed to find an optimal solution to the scheduling problem. Let N be the set of nodes contained in resource offerings, J the set of jobs contained in resource requests, T S (n) := {t ℵ ε n t λ n } the set of time slots in which node n N is available according to the respective resource offering, T b (j) := {t ℵ s j t e j } the set of time slots in which job j J requests to get executed, and N(j) := {n N T b (j) T S (n), c j c n, m j m n} the set of nodes which can potentially execute job j. We introduce the binary decision variable x jnt {0,1} with x jnt = 1 if job j will be executed on node n in time slot t and x jnt = 0 otherwise. The scheduling problem can be formalized as the following mathematical integer programme: [SP] max V : = c x ( b r ) x jnt n N( j) b j J j t T ( j) n N( j) jnt j n b s.. t x {0,1}, j J, t T ( j), n N( j) ( C1) jnt j J j J b x 1, j J, t T ( j) ( C2) jnt S x m m, n N, t T ( n) ( C3) jnt j n S x c c, n N, t T ( n) ( C4) jnt j n b x = ( e s + 1) x, j J, t T ( j) ( C5) b u T ( j) n N( j) jnu j j n N( j) jnt b j jnt n b x r, j J, n N( j), t T ( j) ( C6) SP assigns jobs to nodes as to maximize welfare V. The economic scheduling scenario is encoded in the constraints of this combinatorial assignment problem as follows: (C1) introduces x jnt as binary decision variable. Jobs can only be assigned to nodes which provide the necessary resources during the right time slots. (C2) ensures that a job is allocated to at most one node at a time. (C3) and (C4) specify that for any given time slot and node, the total resource consumption in terms of memory and computing power cannot exceed the resources provided by this node. (C5) defines atomicity, i.e. every job is either executed as a whole or it is not executed at all. (C6) ensures that for each allocation of a job to a node, the requester s willingness to pay is equal to or greater than the provider s reservation price. The optimal allocation schedule X* := (x*) j J,n N,t T b(j) yielding welfare V* can be derived directly from the solution to SP. Example: For the sample resource requests and offers listed in Table 1, building and solving SP yields the optimal allocation schedule X* shown in Figure 1. In time slot 1, only job J2 runs on node N1. The requester of job J2 is willing to pay up to $10 [per computing cycle] * 80 [computing cycles] = $800 per time slot. The provider of node N1 requires a minimum payment of $5 per computing cycle and time slot and receives depending on the pricing which will be elaborated below at least $5 * 80 = $400. Thus the allocation of job J2 to node N1 in time slot 1 yields welfare of $800 - $400 = $400. Across all time slots, X* generates welfare of V* = $3,420. J1 J2 J3 J4 J5 J6 J7 Exact scheduling Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 J2 J2 J2 J4 J4 J1 J1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Node N1 J3 J3 J3 J3 Node N2 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Node N3 Figure 1. Optimal Allocation Schedule

63 While this exact mechanism generates an optimal allocation schedule in terms of economic efficiency, this optimality only comes at the expense of computational efficiency: SP is an instance of a multiple knapsack problem (Ferrari 1994) and thus NP-hard. Computational costs of determining the allocation and the pricing may become a problem in dynamic network-centric Grid OS with a large number of resource requests and offers (Waldspurger et al. 1992). In these settings, a trade-off between efficiency and computational complexity may be desirable to support interactive applications. 3 TRUTHFUL SCHEDULING HEURISTICS Heuristics aim at obtaining a good but generally suboptimal (in terms of welfare) allocation fast. Relaxing allocative efficiency not only impacts the outcome determination but also the pricing scheme: compromising a mechanism in terms of allocative efficiency generally implies compromising truthfulness as well (Mu alem and Nisan 2002). Lehmann et al. (2002) and Mu alem and Nisan (2002), however, elaborate necessary and sufficient conditions that do yield truthful, approximating mechanisms for a restricted class of users: known single-minded bidders. Informally, a bidder (user) is known single-minded if she only desires one specific set of goods and this set is known to the mechanism. Applying these conditions to the scenario above in order to design a truthful mechanism suggests itself: in a non-combinatorial multi-attribute exchange such as a market-based scheduler for Grid OS, resource requesters and providers are only allowed to request and offer one single bundle of resources corresponding to the bundle of attributes in the scenario at hand this is computing power and memory. 3.1 A Greedy Scheduling Heuristic Lehmann et al. (2002) and Mu alem and Nisan (2002) propose greedy heuristics for constructing truthful heuristics for single-unit combinatorial auctions with one seller and multiple buyers. We build on this proposal and apply this greedy principle to construct a heuristic for the multi-attribute Grid exchange at hand with multiple sellers and multiple buyers. This heuristic consists of two basic phases: 1. Sort the requests j J and offers n N in non-increasing and non-decreasing order with respect to some norm η; 2. Sequentially run through the resulting rankings and allocate the requests with the highest ranking to the offers with the highest ranking (cf. pseudo code in Figure 2). (1) Sort jobs j J in non-increasing order and nodes n N in non-decreasing order of η. (2) Run sequentially through the job ranking, starting with the highest ranked job. For each job j: (3) Run sequentially through the node ranking, beginning with the highest ranked node, and check if j can be executed, i.e. whether conditions (b j r m ) (c j c m) (m j m m) are satisfied for some m {n 1,...,n j } N, i.e. the nodes with the highest ranking which can together accomodate job j, in time slots t T b (j). (4) If so, allocate j to {n 1,...,n j }; Update the residual capacities of m {n 1,...,n j }, i.e. subtract m j from m m and c j from c m in time slots t T b (j). (5) Continue at (2). Figure 2. Heuristic for obtaining the initial allocation. The allocative efficiency of this greedy heuristic essentially hinges on norm η used in the sorting phase (Lehmann et al. 2002). This norm must lead to efficient rankings in the sense that the heuristic can create an approximately efficient allocation based on these rankings. A straightforward choice is to use b i and r i respectively as these set the valuation for a job (node) in relation to the amount of resources requested (offered). In conjunction with the sequential allocation rule in phase two, the resulting heuristic truly implements a greedy allocation algorithm: in each allocation step, it intends to maximize b j r n from the objective function of SP above

64 Example: Assume η = b j, j J, and η = r n, n N. Job j b j c j m j s j e j Node n r n c n m n ε n λ n J N J N J N J J J J Figure 3. Example of the greedy heuristic. Jobs J1 and J2 can be fully executed on node N1. In time slots 2 and 3, job J3 does not fit on N1 but only on N2 which is next in the ranking. In time slots 4 and 5, J3 does fit on N1. Since J3 is allocated to N1 in time slots 4 and 5, J4 does not fit on N1 anymore but only on N3, in contrast to the optimal allocation X*. In total, X greedy yields welfare V greedy = $3,000 and an approximation ratio of V greedy / V* = $3,000 / $3, %. The greedy heuristic generates the following allocation schedule X greedy : J1 J2 J3 J4 J5 J6 Heuristic scheduling Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 J2 J2 J2 J3 J3 J1 J1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 J3 J3 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 J7 J4 J4 Node N1 Node N2 Node N3 Figure 4. Greedy allocation schedule. The question is: How can these allocations of requests to resources and vice versa be translated into corresponding monetary transfers (prices and payments) according to the desired design criteria introduced above? Mu alem and Nisan (2002) and Lehmann et al. (2002) propose a pricing scheme which generates truthful prices in restricted one-sided, combinatorial auctions with one provider and multiple requesters. In the remainder, these theoretical results will be applied to the Grid OS context to generate truthful prices for resource requesters. In contrast to the setting of Mu alem and Nisan (2002) and Lehmann et al. (2002), the Grid OS market consists of multiple requesters and multiple providers. An algorithm will hence be proposed which yields approximately truthful payments to resource providers. 3.2 Critical Value-based Pricing of Resource Requests Following the spirit of the truthful Vickrey auction, the payment of job j corresponds to its critical value given sets J and N of jobs and nodes (Lehmann et al. 2002, Mu alem and Nisan 2002). The critical value of φ j is the minimal valuation that j has to report in order to remain in the allocation schedule, keeping all other requests and offers unchanged. Note that it is not possible to simply take the valuation of the job with the highest ranking which does not get executed since this job may not be executable in general due to capacity constraints. Moreover, removing j from the winning allocation might change the allocation of other jobs within the winning allocation as well. To determine the critical value for each winning job j, we need to determine the allocation without j: we successively allocate all other jobs from the ranking and, after having allocated a job, check whether j can still be accommodated (cf. pseudo code in Figure 5)

65 (1) Sort jobs k J, k j, in non-increasing order and nodes n N in non-decreasing order of η. (2) Is any job k j left in the ranking? If not, set φ j = max r m { n1,..., nj ) m where {n 1,...,n j } are the nodes with the highest rankings which can together accommodate j, and finish. If so, select job k with the highest ranking. (3) Can k be accommodated? If not, continue at (2). If so, allocate k to the nodes {n 1,...,n k } N with the highest ranking that can accommodate k, and update the residual capacities of m {n 1,...,n k }. (4) Can j still be accommodated on some nodes {n 1,...,n j } N? If not, set φ j = b k and finish. (5) If so, set r: = maxm { n1,..., nj ) rm. Either r < b k, then continue at (2); or r b k, then it is cheaper for j to push away k than to take the next available nodes {n 1,...,n j }; set φ j = b k and finish. Figure 5. Determining the critical value of job j and refining the initial allocation. The critical value must be computed once for each winning job j. However, compared to the computational intractability of the VCG mechanism, the computation now runs in polynomial time. Every winning job j has to pay p greedy,j = (e j s j + 1)c j φ j whereas every rejected job pays nothing. Example: Applying the algorithm to the example above, we get p greedy,j1 = (e J1 s J1 + 1)c J1 φ J1 = 120 b J7 = 120 r N1 = $600, i.e. the cheapest option for job J1 is to push away J7 by bidding $5 (plus some ε) which happens to equal the minimum reservation price of node N1. In total, the mechanism collects a revenue R of $6,400 from the resource requesters. Job j J1 J2 J3 J4 φ j p greedy,j 600 1,200 2,520 2,080 Table 2. Payments of resource requesters. The greedy allocation scheme combined with critical-value based pricing creates a truthful scheduling heuristic with respect to resource requesters: It is a weakly dominant strategy for resource requesters to report their true valuation for a job (the required resources). The payment depends on the critical value which is independent of the reported valuation. Assume a risk-neutral requester j has reported her true valuation v j, i.e. b j = v j. Then there are four cases to be considered. State\Action Decrease b j Increase b j j was accepted Case 1 Case 2 j was rejected Case 3 Case 4 Table 3. Proof sketch. Suppose j has won. Reporting a lower valuation (Case 1) might have resulted in j not being accepted while it would not have reduced j s payment. Reporting a higher valuation (Case 2) would not have generated more utility either since j has been accepted anyways. Now suppose j has been rejected (φ j > b j = v j ). Reporting a lower valuation (Case 3) would still have left j outside the winning allocation. Reporting a higher valuation (Case 4) might even have resulted in a loss: if j had been accepted, b j > φ j > v j. 3.3 Budget-balanced, Approximately Truthful Payments to Resource Providers The concept of critical value-based pricing cannot be applied when determining the payments for divisible resource offers. The concept requires a binary decision in the sense that an offer is either fully executed or it is not executed at all. We allow divisibility of offers, i.e. not all offered resources need to be used but the offer can be executed partially

66 The algorithm for determining the critical values ensures that each winning job pays at least the reservation price of the node it is allocated to. If there is competition from another job for the same slot, the critical value will exceed the reservation price. The question is: How can revenue R be distributed to the resource providers so as to ensure budget-balance, individual rationality and to approximate truthfulness? The VCG mechanism is generally not budget-balanced (Schnizler et al. forthcoming) and, combined with the heuristic above, not truthful (Mu alem and Nisan 2002). A straightforward approach is to adopt the algorithm of Parkes et al. (2002) to our greedy heuristic. The basic idea of their algorithm is to approximate the VCG mechanism s truthfulness by computing discounts Δ parkes that minimize the distance to the VCG-discounts Δ vick while ensuring budget-balance. In our algorithm, we first compute Δ temp,n := V greedy (V n ) greedy for each winning node n. The underlying assumption is that the greedy Δ temp,n (which may in total exceed the surplus R n N v^n(x greedy ) from the request-side) approximate the truthful VCG-discounts Δ vick,n. We then solve the mathematical programme [BB greedy ] ( Δ Δ temp greedy ) Δ = n N greedy, n min L, Δ greedy greedy, n temp, n greedy ( ) s.. t R vˆ X ( C7) n N 0 Δ Δ, n N ( C8) The aim of BB greedy is to compute discounts Δ greedy,n as to approximate Δ temp,n in turn while ensuring (strong) budget-balance by distributing the surplus R n N v^n(x greedy )) to resource providers (C7). Parkes et al. (2002) suggest and examine various distance functions analytically and numerically and recommend the threshold function L 2 (Δ temp, Δ greedy ) := n N (Δ temp,n Δ greedys,n ) 2 as it minimizes the residual degree of manipulation freedom (Parkes et al. 2002), that is the maximum amount of utility a user can gain from reporting untruthfully. Analogously to the algorithm of Parkes et al. (2002), the optimal Δ greedy,n can be computed without explicitly having to solve the non-linear programme BB greedy. Let = Δ temp,0 Δ temp,1 Δ temp,2 Δ temp, N Δ temp, N +1 = 0 be the partial ordering over the temporary greedy discounts for the winning nodes. Then, for the threshold function L 2, Parkes et al. (2002) show that there is an interval K and a unique point C t * within this interval such that greedy ( R vˆ n ( X )) K Δ * i= 1 temp, i n N C temp K + t temp K Δ = Δ ( C9), 1, K and Δ greedy,n = max{0, Δ temp,n - C t * } solves BB greedy. Node n receives a positive discount if n s temporary greedy discount is greater than the threshold C t *. Otherwise n does not get any discount. C t * can be computed by running sequentially through the partially ordered temporary discounts and checking if condition (C9) is satisfied. Example: In the example, R = $6,400 > $6,040 = n N v^n(x greedy ), i.e. there are $360 above the reservation prices which are to be distributed as to approximate truthful payments. Solving BB greedy with distance function L 2 leads to C t * = $1,680 and the greedy discounts listed in Table 4. The total surplus of $360 is allocated to node N1, N2 and N3 only receive payments amounting to their reservation prices. Node n v^n(x greedy ) (V n ) greedy p temp,n Δ temp,n p greedy,n Δ greedy,n N1-2, ,740 2,040-3, N2-1,260 2,820-1, ,260 0 N3-2,080 3,000-2, ,080 0 Σ -6,040-8,260 2,220-6, Table 4. Approximately truthful greedy payments. n

67 3.4 Evaluation of the Heuristic The greedy heuristic is designed so as to obtain approximately efficient allocations fast. It runs in polynomial time in the size of resource requests and offers: In the first phase, requests and offers get sorted in O( J log J ) and O( N log N ). In the second phase, the greedy allocation scheme sequentially runs through the J sorted requests and for each job j tries to allocate j to one of the N sorted offers. Hence the allocation phase runs in O( J N ). Note that the feasibility of allocating a specific job to a specific node can be tested in O(1) since the number of attributes which need to be checked is constant. Inherently, this speed is only achieved at the expense of allocative efficiency compared to exact mechanisms. In a theoretical worst case analysis, the approximation of the efficient allocation can be made arbitrarily bad with respect to norm b j, r n suggested above by means of simple examples. The pricing scheme of the greedy mechanism consists of two parts: a truthful pricing scheme for resource requesters and an approximately truthful payment scheme for resource owners. BB greedy links these two sides as to ensure budget-balance. The resulting prices are individual rational; a user reporting her true valuation will not have to pay more than her reported valuation if her request gets accepted and she will not have pay anything if she does not obtain the resources, and accordingly for resource owners. 4 RELATED WORK The study of market mechanisms for Grid and the implementation of running Grid market prototypes have received significant attention in the past. SPAWN (Waldspurger et al. 1992) implements a market for computing resources in which each workstation auctions off idle computing time to multiple applications by means of a Vickrey auction. All resources are allocated to at most one application at a time regardless of this application s actual demand which yields low resource utilization. Chun and Culler (1999) realized a prototpyical market for computing time in which one resource provider sells computing time to multiple requesters. Resource requesters get allotted computing time proportional to their share in the total reported valuation across all requesters. The POPCORN market (Regev and Nisan 1998) is an online market for computing power which implements a Vickrey auction as well as two double auctions. All of these approaches share two major drawbacks: First, they allow the specification and trading of computing power only. But requesters require a bundle of resources such as computing power, memory, and bandwidth. On the one hand, the approaches at hand thus lead to inefficient allocations since requests with the same demand for computing power but different memory requirements, for instance, are treated the same. On the other hand, requesters are exposed to the risk of only being able to obtain one leg of the bundle of required resources without the other ( exposure risk, Schnizler et al. forthcoming). A second limitation of these approaches is that they do not support reservations of resources in advance which are essential for Quality of Service assertions. Schnizler et al. (forthcoming) propose a comprehensive model that targets these deficiencies. They suggest the use of a multi-attribute combinatorial exchange (MACE). Users are allowed to request and offer arbitrary bundles of grid resources and can specify quality attributes on these resources. MACE implements an exact mechanism. The scheduling problem in this combinatorial setting is NP-hard, the pricing scheme of MACE yields approximately truthful prices. With truthful prices, strategic users do not have an incentive to report any valuation other than their true valuation. Due to the NP-hardness of the problem, the mechanism is adequate for batch applications the use for interactive applications is rather limited. The work of Bapna et al. (forthcoming) is most relevant to the work presented in this paper. In their model, multiple requesters and providers can trade both computing power and memory for a sequence of time slots. First, Bapna et al. (forthcoming) introduce an exact mechanism. By introducing fairness constraints and imposing one common valuation across all resource providers, they structure the search space in the underlying combinatorial allocation problem as to establish one common, truthful price per time slot for all accepted requests. Additionally, this introduces a linear ordering across all jobs and time which reduces the complexity of the allocation problem, which however still remains

68 NP-hard. To mitigate this computational complexity, they thus propose a fast, greedy heuristic at the expense of both truthfulness and efficiency. Archer et al. (2003) also design a heuristic on the basis of Mu alem and Nisan (2002) and Lehman et al. (2002) which, however, cannot be applied to the Grid OS context since multiple items of resources are traded, such as x computing cycles. The truthfulness of their mechanism, however, only holds for a small number of items. 5 CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS In Section 1, we argued that a network-centric Grid OS coupled with market-based scheduling can increase efficiency in Grid and cluster environments by adequately allocating all available resources. This distinguishes network-centric Grid OS from state-of-the-art Grid middleware such as Globus, glite, and UNICORE which rely on batch processing of idle resources only. While there are highly sophisticated market mechanisms available for Grid middleware (e.g. MACE), there are no mechanisms available for network-centric Grid OS which rely on interactive application processing. In Section 2 we formalized the market mechanism as a multi-attribute exchange in which resource owners and consumers can publish their demand and supply. Exact market-based scheduling mechanisms share one deficiency: they are NP-hard. While this may not be a problem in a cluster setting with a small number of users, it may prove crucial in an interactive, large-scale Grid OS environment. We thus proposed a greedy heuristic in Section 3 which performs a fast scheduling while preserving truthfulness on the request-side and approximating truthfulness on the provisioning-side of the market. In Section 4, we presented related work on market-based resource allocation in Grids. In essence, our market mechanism suggests several intriguing research avenues. We intend to implement a prototype of the heuristic for dynamic scheduling in MOSIX, a state-of-the-art Grid OS presented in Section 1. Numerical analyses need to be performed to further analyze the heuristic s properties with respect to the economic and technical requirements in Grid OS. The approximation of efficiency needs to be compared to the optimal allocation achieved by exact mechanisms. This ratio essentially depends on the norm used in the ranking phase of the heuristic. More sophisticated norms will be developed and analyzed. The run time of the presented scheduling mechanisms needs to be evaluated to determine the critical number of requests and offers at which an exact mechanism might become infeasible and to determine the speedup achieved by the heuristic. And finally, the approximation of truthfulness on the provisioning-side of the market needs to be analyzed. It is desirable to allow users to specify dependencies between resources, such as substitutability of computing power for memory. The model in this paper hence needs to be expanded to allow for such extended bidding logics. Acknowledgement This work has been partially funded by EU IST programme under grant SORMA Self- Organizing ICT Resource Management. References Ali, A., A. Anjum, J. Bunn, R. Cavanaugh, F. van Lingen, R. McClatchey, M. Atif Mehmood, H. Newman, C. Steenberg, M. Thomas, I. Willers (2004). Predicting the Resource Requirements of a Job Submission. Proceedings of Computing for High Energy Physics, Interlaken, Switzerland. Archer, A., C. Papadimitriou, K. Talwar and E. Tardos (2003). An Approximate Truthful Mechanism for Combinatorial Auctions with Single Parameter Agents. Proceedings of the 14 th annual ACM- SIAM symposium on Discrete algorithms, Baltimore, Maryland, pp Bapna, R., D. Sanjunkta, R. Garfinkel and J. Stallaert (forthcoming). A market design for grid computing. INFORMS Journal of Computing

69 Barak, A., A. Shiloh and L. Amar (2005). An Organizational Grid of Federated MOSIX Clusters. Proceedings of the 5 th IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Cardiff, Wales. Berlich, R., M. Kunze and K. Schwarz (2005). Grid Computing in Europe: from research to deployment. Proceedings of the 2005 Australasian Workshop on Grid computing and e-research, Newcastle, Australia, 44, pp Bichler, M, M. Kaukal and A. Segev (1999). Multi-attribute auctions for electronic procurement. Proceedings of the 1 st IBM IAC Workshop on Internet Based Negotiation, Yorktown Heights, NY. Buyya, R., D. Abramson and J. Giddy (2000). Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid. Proceedings of the HPC ASIA 2000, the 4 th International Conference on High Performance Computing in Asia-Pacific Region, Beijing, China. Chun, B. and D.E. Culler (1999). Market-based proportional resource sharing for clusters. Millenium Project Research Report. Available at (November 16, 2006). Cirne, W., F. Brasileiro, N. Andrade, R. Santos, A. Andrade, R. Novaes and M. Mowbray (2006). Labs of the World, Unite! Journal of Grid Computing, 4 (3), pp Ferrari, C.E. (1994). On Combinatorial Optimization Problems Arising in Computer Systems Design. Ph.D. Thesis, Technische Universität Berlin. Foster, I., and C. Kesselman (2004). The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco, 2 nd edition. Gorlatch, S., and J. Müller (2005). From Interactive Applications Toward Network-Centric Operation Systems. Workshop on Network-Centric OS, Brussels. Lai, K., B.A. Huberman and L. Fine (2004). Tycoon: a Distributed Market-based Resource Allocation System. Technical Report, Hewlett Packard. Lehmann, D., L.I. O Callaghan and Y. Shoham (2002). Truth Revelation in Approximately Efficient Combinatorial Auctions. Journal of the ACM, 49 (5), pp Minoli, D. (2004). A Networking Approach to Grid Computing. John Wiley & Sons, Hoboken, New Jersey. Mu alem, A. and N. Nisan (2002). Truthful Approximation Mechanisms for Restricted Combinatorial Auctions. In AAAI (poster). Also presented at Dagstuhl workshop on Electronic Market Design. Myerson, R.B. and M.A. Satterthwaite (1983). Efficient mechanisms for bilateral trading. Journal of Economic Theory, 28, pp Padala, P. and J.N. Wilson (2003). GridOS: Operating System Services for Grid Architectures. LNCS 2913/2003, pp Springer Verlag, Berlin. Parkes, D.C., J. Kalagnanam and M. Eso (2002). Achieving budget-balance with vickrey-based payment schemes in exchanges. IBM Research Report, draft, March 13, Regev, O. and N. Nisan (1998). The POPCORN market an online market for computational resources. Proceedings of the 1 st international conference on Information and computation economies, New York, NY, pp Schnizler, B., D. Neumann, D. Veit and C. Weinhardt (forthcoming). Trading Grid Services A Multi-attribute Combinatorial Approach. European Journal of Operational Research. Sutherland, I.E. (1968). A futures market in computer time. Communications of the ACM, 11 (6), pp Waldspurger, C., T. Hogg, B.A. Huberman, J.O. Kephart and W.S. Stornetta (1992). Spawn: A Distributed Computational Economy. IEEE Transactions on Software Engineering, 18 (2), pp

70 Economically Enhanced MOSIX for Market-based Scheduling in Grid OS Lior Amar, Jochen Stößer, Amnon Barak, Dirk Neumann Institute of Computer Science The Hebrew University of Jerusalem Jerusalem, Israel Institute of Information Systems and Management Universität Karlsruhe (TH) Englerstr. 14, Karlsruhe, Germany Abstract Applying economic principles to grids is deemed promising to improve the overall value provided by such systems. By introducing market mechanisms, end users can influence the allocation of resources by reporting valuations for these resources. Current market-based schedulers, however, are static, assume the availability of complete information about jobs (in particular with respect to processing times), and do not make use of the flexibility offered by computing systems. In this paper, we present our progress in implementing a novel market mechanism for MOSIX, a stateof-the-art management system for computing clusters and organizational grids. The market mechanism is designed so as to be able to work in large-scale settings with selfish agents. Facing incomplete information about jobs characteristics, it dynamically allocates jobs to computing nodes by leveraging preemption and process migration, two distinct features offered by the MOSIX system. 1. Introduction Grid computing denotes a computing paradigm where computer resources are shared across administrative boundaries in order to efficiently serve compute-intense applications at low cost and to accommodate peak loads [6]. So called Grid Operating Systems (Grid-OS) are a revolutionary approach to grid computing [13]. Grid OS encapsulate the access to and management of grids on the operating system layer already. Applications can be left unchanged, leading to ease of use and hence increased acceptance of grid technologies. Moreover, instead of sharing only idle resources, grid OSs can be leveraged to efficiently share all resources. MOSIX 1 is a state-of-the-art grid OS [1] that makes an 1 MOSIX R is a registered trademark of A. Barak and A. Shiloh. x86 based Linux cluster and an organizational grid perform like a single computer with multiple processors. A production organizational grid with 15 MOSIX clusters ( 650 CPUs) in different departments is operational at Hebrew University (see The features of MOSIX allow better utilization of grid resources by users who need to run demanding applications but cannot afford such a large cluster. A key issue in inter-organizational grids is the decision which resource is allocated to whom. Current technical schedulers are designed to maximize the resource utilization or to balance the overall system load. In case of excess demand, such schedulers do not maximize the overall value of the system since users do not have means to report their valuations for resources. Market mechanisms are thus deemed promising to improve resource allocation in grids and thus the systems value by explicitly involving users in the allocation process. Users report their valuations for resources to the system and are allocated resources and computational tasks based on these reports. Corresponding monetary transfers between resource requesters and providers serve so as to induce users to report truthfully about resource demand and supply, and to provide incentives to contribute resources to the system. Applying market mechanisms to resource allocation in computing systems is not a new idea [16, 3, 12]. However, this previous work is not tailored towards the use in grid OS for three reasons: (1) The Bellagio system [3] and the OCEAN system [12] are both located above the OS layer and interoperate with existing grid middleware such as Globus [5]. (2) From a market mechanism perspective, the employed matching algorithms are not suited for grid OS. The combinatorial allocation scheme of Bellagio and its computationally hard pricing scheme make this mechanism infeasible for settings with potentially large numbers of users and where immediacy of the resource allocation is crucial. The same holds for the negotiation protocol em- 66

71 ployed in OCEAN which places significant communicational load on the system. (3) These existing market mechanisms assume complete information about resource requests and offers, in particular concerning processing times, and determine static allocations which do not dynamically adapt as the status of the system and its entities changes over time. The contribution of this paper is twofold: We propose a market mechanism which is specifically tailored towards the needs and technical features of MOSIX so as to generate highly efficient resource allocations in settings with large numbers of selfish agents; and We present a first implementation of the proposed market mechanism in MOSIX which serves as a proof of our concept. The paper is organized as follows. The next section presents an overview of MOSIX and its main relevant features. Section 3 presents the proposed market mechanism we use to enhance the resource management of MOSIX. In Section 4, we present the design and the pilot implementation of the MOSIX Economic Infrastructure (MEI). Section 5 concludes the paper and points to future work. 2. MOSIX Background MOSIX is a management system for clusters and organizational grids [1]. Its main feature is to make all the participating nodes perform like a single computer with multiple processors, almost like an SMP. In a MOSIX system, users can run applications by creating multiple processes, then let MOSIX seek resources and automatically migrate processes among nodes to improve the overall performance, without changing the run-time environment of migrated processes. The goal of MOSIX is to allow owners of nodes to share their computational resources from time to time, while still preserving the autonomy of the owners, allowing them to disconnect their nodes from the grid at any time, without sacrificing guest processes from other nodes. MOSIX is implemented as a software layer that provides applications with an unmodified Linux run-time environment. Therefore, there is no need to change or even link applications with any special library. Moreover, MOSIX supports most additional Linux features that are relevant to ordinary, non-threaded Linux applications, so that most Linux programs can run unchanged (and be migrated). To submit processes via MOSIX, users use the program mosrun. For example, the command mosrun -m450 myprog myprogargs, starts and later may migrate the program my-prog only to nodes with at least 450 MB of free memory. The relevant features of MOSIX to the work presented in this paper are the automatic resource discovery, the process migration and the freezing mechanisms. The following subsections provide some detailes about these features. Other features of MOSIX include a secure run-time environment (sandbox), that prevents guest processes from accessing local resources in hosting nodes; live-queuing that preserves the full generic Linux environment of queued jobs; gradual release of queued jobs, to prevent flooding of the grid or any cluster; checkpoint and recovery and support of batch jobs Automatic Resource Discovery Resource discovery is performed by an on-line information dissemination algorithm [2], providing each node with the latest information about the availability and state of grid resources. The dissemination is based on a randomized gossip algorithm, in which each node regularly monitors the state of its resources, including the CPU speed, current load, free and used memory, etc. This information, along with similar information that has been recently received by that node is routinely sent to a randomly chosen node, where a higher probability is given to choosing target nodes in the local cluster. The outcome of this scheme is that each node maintains a local information-vector with information about all active nodes in the local cluster and the grid. Any client requiring information about cluster or grid nodes can simply access the local node s information-vector and use the stored information. Information about newly available resources, e.g., nodes that have just joined the grid, is gradually disseminated across the active nodes, while information about nodes in disconnected clusters is quickly phased out Process Migration and Scheduling MOSIX supports cluster and grid-wide (preemptive) process migration [4]. Process migration can be done either automatically or manually. The migration itself amounts to copying the memory image of the process and setting its run-time environment. In MOSIX, the node where the process was initially started is referred to as the process s home-node. After a process is migrated out of its homenode all the system-calls of that process are forwarded to and processed in the home-node. The resulting effect is that there is no need to copy file or libraries to the remote nodes, and a migrated process can be controlled from its homenode as if it was running there. Automatic migration decisions are based on (run-time) process profiling and the latest information on availability of grid resources, as provided by the information dissemination algorithm. Process profiling is performed by continuously collecting information about its characteristics, e.g., size, rates of system-calls, volume of IPC and I/O. This in- 67

72 formation is then used by competitive on-line algorithms [7] to determine the best location for the process. These algorithms take into account the respective speed and current load of the nodes, the size of the migrated process vs. the free memory available in different nodes, and the characteristics of the processes. This way, when the profile of a process changes or when new resources become available, the algorithm automatically responds by considering reassignment of processes to better locations Freezing Support In a dynamic grid environment the availability of resources may change over time, e.g., when clusters are connected or disconnected from the grid. In MOSIX, guest processes running on nodes that are about to be disconnected are migrated to other nodes or back to their home-node. In the last case, the available memory at the home-node may not be sufficient to receive all returning processes. To prevent the loss of running processes, MOSIX has a freezing mechanism that can take any running MOSIX process, suspend it, and store its memory image in a regular file (on any accessible filesystem). This mechanism can be activated automatically when a high load is detected, or manually upon specific request. The freezing mechanism ensures that a large number of processes can be handled without exhausting CPU and memory. Automatically frozen processes are reactivated in a circular fashion in order to allow some work to be done without overloading the owner s nodes. Later, when more resources become available, the load-balancing algorithm migrates running processes away, thus allowing reactivation of additional frozen processes. 3. A Market Mechanism for MOSIX In order to enhance the economic value provided by the system and to provide incentives to contribute resources to the grid, we implemented a market mechanism in the MOSIX system. In this section, we will specify our first approach towards designing such a market mechanism for MOSIX which accounts for its requirements in particular online scheduling without knowing processing times in advance while at the same time leveraging its distinct features such as migration and freezing Requirements for Market-based Scheduling Scheduling in MOSIX gives rise to a number of requirements towards the market mechanism (e.g. [14]). The most prominent requirements are: Immediacy: The timing of the allocation is crucial. The allocation scheme must be scalable in order to be able to allocate large numbers of resource requests and offers (thousands of jobs and CPUs) Allocative efficiency: There shall be no waste of resources; the system is supposed to make optimal use of its resources and thus maximize the system s overall which is returned to its users. Truthfulness: Participants cannot benefit from cheating the mechanism, i.e. reporting any valuation for a resource other than their true valuation. A market mechanism consists of three elementary components: a bidding language which specifies what information is exchanged between the mechanism and its users (Subsection 3.2), an allocation scheme which determines the assignment of offered resources to requests (Subsection 3.3), and a pricing scheme which determines corresponding monetary transfers between the market mechanism and its participating users (Subsection 3.4) The Model Let J be the set of resource requests. We assume that request (or job ) j J is fully described by the tuple (v j, c j, m j, s j, l j ). Computing power is the central scarce resource in the system. Consequently, v j R + denotes j s valuation (or maximum willingness to pay ) per unit of computing power and time (e.g. per standard processor and hour of processing time), c j N and m j N are j s minimum required amount of computing power (e.g. number of processors in case of a parallelizable job) and memory (e.g. in MB) respectively, s j R + denotes j s release time (the point in time at which is reported to the mechanism), and l j R + is j s processing time (or length, e.g. in hours). However, with processes being located on the operating system level, users typically do not know the length of their jobs before these jobs are actually completed. Consequently, we assume that resource requesters submit the tuple (v j, c j, m j ) to the mechanism at time s j. Moreover, let N be the set of resource offers. We assume that offer (or node ) n N is fully described by the tuple (r n, c n, m n, ɛ n, λ n ) where r n R + denotes n s valuation (or reservation price ) per unit of computing power and time, c n N and m n N is the maximum available amount of computing power and memory respectively, and ɛ n R + and λ n R + specify the timeframe during which these resources are available. For now, we further assume that nodes can potentially execute multiple jobs in parallel but that one job can only be executed by one node at a time. 2 Again, we assume that resource providers do not know how long they can contribute their resources, e.g. because they 2 In future extensions of this basic model we plan to also support parallelism. 68

73 do not know when these resources will be required by a local process, and thus submit the tuple (r n, c n, m n ) to the mechanism at time ɛ n. This setting is oftentimes referred to as being an online setting where information is gradually being released to the mechanism as opposed to the offline setting where all information is known to the mechanism at the time it makes its allocation and pricing decisions. Furthermore, we are facing a so called non-clairvoyant scheduling problem where processing lengths are unknown [10]. Example: Assume at time t when the market mechanism is to make its allocation and pricing decisions ( clearing ), the sample resource requests and offers listed in Table 1 have been collected. Job J1 requires one processor and 200 Job j v j c j m j Node n r n c n m n J N J N J J Table 1. Sample resource requests and offers MB of memory. It is willing to pay up to $1.8 per processor (and thus in total) per time unit. Node N2 is offering two processors and 240 MB of memory while requiring at least $1.0 per processor (and thus $2.0 in total) per time unit The Allocation Scheme The mechanism clears at discrete points in time t R +. At time t, the requests J(t) = {j J s j t s j + l j } and the offers N(t) = {n N ɛ n t ɛ n + λ n } have been collected by the market mechanism. As introduced in Subsection 3.1, we intend to design and implement an allocation scheme with the aim of maximizing allocative efficiency (i.e. welfare), where allocative efficiency is the sum over all users utility resulting from a specific market outcome (allocations and prices) and may also be interpreted as overall happiness. Let V (t) be the resulting welfare at time t and V = t R + V (t) the overall welfare generated by the mechanism. Since we are facing an online setting, in order to maximize V, we must maximize V (t) for all t. This scheduling problem can be formalized as the following integer program: max X(t) V (t) = j J(t) c j n N(t) x jnt(v j r n ) s.t. x jnt {0, 1}, v j r n, j J(t), n N(t) (1) n N(t) x jnt 1, j J(t) (2) j J(t) x jntc j c n, n N(t) (3) j J(t) x jntm j m n, n N(t) (4) X(t) gives the (optimal) allocation at time t. Constraint (1) introduces the binary decision variable x jnt with x jnt = 1 if job j J(t) is allocated to node n N(t) at time t and x jnt = 0 else. A job can only be allocated to a node whose reservation price (its minimal required payment) does not exceed this job s maximum willingness to pay. Furthermore, a job can only be allocated to one node at a time (Constraint (2)). Constraints (3) and (4) enforce that the resource requirements by all jobs allocated to one specific node at a time do not exceed the amount of resources available on this node. This program is an instance of a Generalized Assignment Problem and thus NP-hard [9]. But in settings with large numbers of resource requests and offers, we must still be able to determine allocations quickly. Heuristics are a promising approach towards tackling this problem. They are very speedy while at the same time being simple to understand, implement, and extend. We present the pseudocode for a greedy heuristic in Algorithm 1. We successively construct a feasible alloca- Algorithm 1 Greedy heuristic Require: A set J(t) of resource requests and a set N(t) of resource offers. Ensure: A feasible allocation X(t) = (x jnt ) of requests j J(t) to offers n N(t). 1: init j J(t), n N(t) : x jnt = 0 2: while J(t) do 3: Select and remove j argmax k J(t) v k 4: N (t) = 5: while N(t) do 6: Select and remove n argmin k N(t) r k 7: if c j c n and m j m n and r n v j then 8: x jnt = 1 9: c n = c n c j and m n = m n m j 10: N (t) n and break 11: else 12: N (t) n 13: end if 14: end while 15: N(t) N(t) N (t) 16: end while 17: return X(t) tion request-by-request by employing a single-pass priorityrule-based scheduling heuristic, where the priority of requests and offers is determined by their reported valuation. 3 By running through the list J(t) of requests in non- 3 Ties may be broken randomly or by considering the memory requirements, for instance. 69

74 ascending order of v j and trying to allocate this request to the cheapest possible offer n N(t), we greedily maximize the term v j r n in the objective function of the integer program introduced above. Example: Job J2 has the highest ranking and can only be allocated to node N2. Consequently, there is only capacity left on node N1, which is then assigned to job J3 which is considered next in the job ranking. This allocation generates welfare amounting to v J2 c J2 + v J3 c J3 r N2 c N2 r N1 c N1 = $ $2.0 $1.0 2 $1.4 = $3 per time unit. This heuristic generates a feasible and approximately efficient allocation at low computational cost. The lists J(t) and N(t) can be sorted in O( J(t) log J(t) ) and O( N(t) log N(t) ) respectively. The actual allocation phase then runs in O( J(t) N(t) ). In our example, the generated welfare happens to correspond to the optimal welfare produced by solving the scheduling problem exactly. However, in general the computational speed of the heuristic will only come at the expense of allocative efficiency [15]. This is exacerbated by the fact that we are facing a non-clairvoyant online setting in which earlier allocation decisions may prove unfortunate (in terms of allocative efficiency) when new requests and offers (and hence new information) arrive in the system. We thus make use of the unique features provided by MOSIX to improve upon such unfortunate decisions: If the allocation scheme determines that a specific job j J(t) is allocated to some resource n N(t) at time t (x jnt = 1), in the next run at time t the allocation scheme may decide to preempt this currently running job in case a job with a higher willingness to pay enters the system. This preempted job may then either be migrated to another node in case the allocation scheme decides to keep this job in the allocation (x jmt = 1, m N(t ), m n), or it may suspend the job and send it back to its home-node in case it decides to not keep the job in the allocation ( n N(t ) x jnt ) = 0). The implementation of this logic will be explained in detail in Section The Pricing Scheme In inter-organizational grid settings, we assume users to be selfish and rational, meaning both resource requesters and providers try to maximize their individual benefit from participating in the system. Resource requesters, for instance, may benefit from understating their valuation, thus potentially lowering their price. We assume that resource requesters can only misreport about their valuations (ṽ j v j ) and their release time ( s j s j ). Overstating resource requirements can only increase the risk of not being allocated or having to pay a higher price. Understating resource requirements can easily be detected and punished by the system, e.g. by killing the job. We further assume that resource providers do not have an incentive to misreport their available resources. Again, overstating can easily be detected and punished by the system whereas providers obviously do not have an incentive to understate, as this cannot increase their resulting allocation and thus their payment. However, they can misreport about their reservation prices ( r n r n ). Consequently, the users actions will generally not work towards the social optimum and we need to complement the allocation scheme with an adequate pricing scheme. With truthful prices (in dominant strategies), users cannot benefit from misreporting about their valuations, independently of the other resource requests and offers in the system. Building on the concept of critical-value-based prices as elaborated in [8] and [11], we will first construct such a scheme for determining truthful prices of resource requests and will then complement this scheme so as to distribute the resulting payments to the supply side of the market. Critical-value-based Pricing of Resource Requests Losing requests j J(t) (i.e. requests which have not been accepted) are not required to pay anything, so we set the price of these requests at time t to p j (t) = 0. For accepted requests, we set p j (t) = φ j c j, where φ j (t) R + denotes j s critical value at time t. This critical value equals the minimum valuation j would have had to report to the system in order to remain in the allocation X(t). The strongest plus of critical-value-based pricing is that it generates truthful prices of resource requests [8, 11, 14]. Truthfulness is a powerful concept. With truthful prices, resource requesters do not need to reason about other users bidding strategies, instead they can simply report their true valuation to the system. They cannot benefit from trying to cheat the mechanism. The drawback of criticalvalue-based pricing is the additional computational burden it adds to the mechanism. The allocation is not independent of a specific request, meaning the allocation of the other requests might change in case a winning request is removed from the allocation. Consequently, in order to determine the critical value φ j (t) of a specific request j at time t, we need to determine the allocation without j. Critical-value-based prices guarantee payments from resource requesters which at least cover the reservation prices of resource providers and potentially generate a surplus exceeding these reservation prices. Intuitively, we will want to distribute this surplus so as to also induce resource providers to truthful reporting of their valuations. This, however, is impossible, as analytically shown in [8] and [15], and we can thus only aim to produce approximately truthful payments. 70

75 Proportional Payments to Resource Providers Proportional payments distribute the surplus generated by critical-value-based pricing to resource providers according to these providers contribution of computing power as compared to the overall contribution. First numerical experiments indicate that this approach exhibits desirable strategic properties while only adding marginal computational burden on the mechanism [15]. More formally, let S(t) = p j (t) j J(t) r x jntc j n c n j J(t) n N(t) be the surplus at time t, i.e. overall payments by resource requesters less overall reservation prices. Then resource provider n receives a payment of j J(t) p n (t) = r x jntc j j J(t) n +S(t) x jntc j c n m N(t) j J(t) x jmtc j per time unit the current allocation is valid. Example: Assume that ties between J2 and J1 and between J3 and J1 are broken in favor of J2 and J3. Then the winning jobs J2 and J3 both need to outbid J1 in order to remain in the allocation (φ J2 = φ J3 = $1.8). Consequently, J2 has to pay 2 $1.8 = $3.6, J3 has to pay $1.8, and J1 and J4 do not have to pay anything. Node N2 receives 2 $ $2 $3.33 while N1 receives $ $2 $ Economically Enhanced MOSIX In this section we describe the enhancement of MOSIX to support economy aware scheduling using the market mechanism presented in the previous section. The MOSIX features used were the automatic resource discovery, manual process migration and manual freezing (the automatic process migration and freezing were disabled). A layer called the MOSIX Economy Infrastructure (MEI) was built on top of the existing MOSIX system. The MEI is composed of 3 main components: the provider component, the client component and the market manager component. Figure 1 shows a schematic view of those components The Provider Component In our implementation each provider runs a daemon called providerd which is responsible for managing the economic properties of that provider. The economic properties currently includes: the provider market status, its reservation price and its current payment, see Figure 1. The market status may be ON or OFF, where ON indicates the provider is currently participating in the market. If the market status is ON, the amount of time the provider is expected to stay Figure 1. MEI components ON is also reported. This amount of time is only used as a hint and the provider may join or leave the market at any given moment. For example, student farm nodes can join the market from 22:00 until 8:00 the morning after (during this time the farm is closed for students). The reservation price specifies the minimal price the provider is willing to accept in order to run processes, and the current payment indicates how much the provider is currently getting payed (if it is used by a job). Those new economic properties were added to the information system of MOSIX and are constantly circulated among the cluster nodes. This way the market manager (see below) can easily obtain information about all the available providers by querying one node in the cluster and obtaining its information vector. An example of the relevant information which each provider supply is presented in Figure 2. A command line tool called provider-ctl enables the owners to online modify the economic properties of the provider. For example an owner may set different reservation prices for different nodes, or can set different time windows for participating in the market for each node The Client Component The program erun (see Figure 4) allow the user to specify economic parameters in addition to the MOSIX standard parameters. The economic parameters includes the 71

76 Figure 2. Sample provider XML information job s maximal price and its budget. The standard MOSIX (non-economic) parameters are forwarded to the mosrun program. For example, the command erun -p40 -m180 myprog myprog-args submits a job with a maximal price of 40 and a memory requirement of 180 MB. Once an erun program is launched, it connects to a daemon called assignd, on the local host, which is responsible for managing all the economy-aware processes in each node. The assignd keeps track of all the erun programs and reporting to the market manager about newborn erun processes and also about erun processes which finished to run. Figure 3 shows an example of a message the assignd send to the market manager when a new job is submitted by the user. Figure 3. Sample job XML information In the figure it can be seen that each job has a unique jobid which is used to identify the job in the cluster. The user may specify the maximal amount of memory the job may need. This property helps the system to avoid assigning the job to nodes with not enough free memory. The cpunum property specifies how many CPUs this job requires, enabling the user to submit parallel jobs which need more than one CPU to run. The max-pay property stands for the maximal price (valuation) the user is willing to pay for running the jobs on a standard CPU for one hour. And finally the max-budget property let the user assign a limited budget to this job. Once the job consumes this budget the job will be suspended until the budget is increased again. The assignd is also responsible for receiving instructions regarding the allocation of erun processes from the market manager. Once such instructions arrive, the assignd uses the underlying MOSIX system to enforce the market decision. For example, if the central market instruct the assignd to move a given erun processes from one provider to another provider, the assignd uses the manual migration capabilities of MOSIX to send the erun to the new location. If on the other hand the assignd is ordered to suspend a process, then the process is frozen to the local disk using the MOSIX freezing mechanism The Market Manager Component The market manager component is responsible for performing the economic scheduling decisions. It consists of a daemon called marketd and a market solver program (which we refer to as the solver). The marketd receives job requests from client nodes (via the assignd s) and collects information about the available providers using the MOSIX automatic resource discovery. Once there are jobs in the system and also available providers, the marketd passes this information to the solver and wait for the solver to output a scheduling decision. The solver is implemented as shown in Section 3. Once the schedule is computed, the solver sends to the marketd a scheduling decision for each job in the system. The marketd then forwards the scheduling decision of each job to the assignd responsible for that job (which carry on the decision). There are two possibilities for running the solver, the first is periodically and the second is upon arrival of a new process. Currently, periodic approach is used. Beside changing the status of the job, the market mechanism also specifies the payment of the job (in case it in a running state), this payment is used to update the totalpayment property of the job. So for each job we always know how much money it spent until now. Figure 4 shows two examples of scheduling decisions the solver may output. The first is for job which schedules the job to run on provider with price of 10. In this case, assuming the job is already running on another provider, the assingd responsible for the job will migrate the job to the new provider. The second schedule is for job 1233, but in this case the job is suspended. Here the assignd responsible for the job will use the freezing mechanism of MOSIX and will freeze (suspend) the job in it home-node. 5. Conclusion & Future Work We have presented an enhancement to the MOSIX system which performs market-based scheduling instead of the existing load-balancing-based scheduling. The process migration and freezing capabilities allow our system to dy- 72

77 Figure 4. Sample scheduling decisions namically allocate and reallocate jobs to any provider node, not only to idle nodes. The market mechanism we propose is specifically tailored towards the needs and technical features of grid OS so as to generate highly efficient resource allocations in large-scale settings. We presented a pilot implementation of the economic enhancement of MOSIX. Our work suggests several intriguing avenues for future research. First, we plan to perform real runs of the system with thousands of jobs and hundreds of providers. We will research how to keep the scheduling period as short as possible (several seconds) in order to keep the system responsive. An interesting issue is supporting short jobs with run times smaller than the clearing period of the market. To do that we need to assign the jobs as soon as they arrive. Here we plane to run a fast version of the market solver each time a new jobs arrives, while running the full version periodically. Another important goal is to introduce technical limitations to the market in order to take into account both performance and economic issues. For example, in the current market, the migration cost of a job is not yet taken into account. This can lead to a scenario where each new schedule causes a large number of concurrent migrations, which in turn can overload the network. We plan to enhance the market by taking the cost of migration into account as well as limiting the number of concurrent migrations. This will make the system more stable and usable in practice. Acknowledgements We wish to thank Amnon Shiloh for his contribution in the development of the erun and assignd programs, and for his valuable ideas. This research was supported in part by grants from the EU IST programme under grant no SORMA Self-Organizing ICT Resource Management. [3] A. AuYoung, B. Chun, A. Snoeren, and A. Vahdat. Resource allocation in federated distributed computing infrastructures. Proceedings of the 1st Workshop on Operating System and Architectural Support for the On-demand IT InfraStructure, [4] A. Barak, A. Shiloh, and L. Amar. An organizational grid of federated mosix clusters. In 5-th IEEE International Symposium on Cluster Computing and the Grid (CCGrid05), May [5] I. Foster and C. Kesselman. Globus: a Metacomputing Infrastructure Toolkit. International Journal of High Performance Computing Applications, 11(2):115, [6] I. Foster, C. Kesselman, and S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15(3):200, [7] A. Keren and A. Barak. Opportunity cost algorithms for reduction of i/o and interprocess communication overhead in a computing cluster. IEEE Tran. Parallel and Dist. Systems, 14(1):39 50, [8] D. Lehmann, L. O Callaghan, and Y. Shoham. Truth revelation in approximately efficient combinatorial auctions. Journal of the ACM (JACM), 49(5): , [9] S. Martello and P. Toth. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc. New York, NY, USA, [10] R. Motwani, S. Phillips, and E. Torng. Nonclairvoyant scheduling. Theoretical Computer Science, 130(1):17 47, [11] A. Mualem and N. Nisan. Truthful approximation mechanisms for restricted combinatorial auctions. Proc. 18th National Conference on Artificial Intelligence (AAAI-02), [12] P. Padala, C. Harrison, N. Pelfort, E. Jansen, M. Frank, and C. Chokkareddy. OCEAN: the open computation exchange and arbitration network, a market approach to meta computing. Parallel and Distributed Computing, Proceedings. Second International Symposium on, pages , [13] P. Padala and J. Wilson. GridOS: Operating System Services for Grid Architectures. Proceedings of International Conference On HP computing, [14] J. Stößer, D. Neumann, and A. Anandasivam. A Truthful Heuristic for Efficient Scheduling in Network-Centric Grid OS. Proceedings of the European Conference on Information Systems (ECIS), 2007, to appear. [15] J. Stößer, D. Neumann, and C. Weinhardt. Market-based Pricing in Grids: On Strategic Manipulation and Computational Cost. Submitted for publication. [16] I. Sutherland. A futures market in computer time. Communications of the ACM, 11(6): , References [1] Mosix [2] L. Amar, A. Barak, Z. Drezner, and I. Peer. Gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties. Submitted for publication. 73

78 GreedEx A Scalable Clearing Mechanism for Utility Computing Authors: Jochen Stoesser, Dirk Neumann Institute of Information Systems and Management (IISM) Universität Karlsruhe (TH) {stoesser,neumann}@iism.uni-karlsruhe.de Englerstr. 14, Karlsruhe, Germany Corresponding author: Jochen Stoesser Institute of Information Systems and Management (IISM) Universität Karlsruhe (TH) Englerstr. 14, Karlsruhe, Germany stoesser@iism.uni-karlsruhe.de Phone Fax The authors have been partially funded by the EU IST programme under grant SORMA Self-Organizing ICT Resource Management. 74

79 Abstract Scheduling becomes key in dynamic and heterogeneous utility computing settings. Market-based scheduling offers to increase efficiency of the resource allocation and provides incentives to offer computer resources and services. Current market mechanisms, however, are inefficient and computationally intractable in large-scale settings. The contribution of this paper is the proposal as well as analytical and numerical evaluation of GreedEx, an exchange for clearing utility computing markets, based on a greedy heuristic, that does achieve a distinct trade-off: GreedEx obtains fast and near-optimal resource allocations while generating prices that are truthful on the demand-side and approximately truthful on the supply-side. Keywords: Market-based Scheduling, Scalable Heuristic, Truthful 75

80 In current business domains, information technology (IT) departments are forced to cut down on IT expenses but are at the same time required to provide the business side with systems that can flexibly be adjusted to new business requirements and processes. This is reflected in the vision of utility computing where computer resources can be accessed dynamically in analogy to electricity and water: Utility computing is the on-demand delivery of infrastructure, applications, and business processes in a security-rich, shared, scalable, and standards-based computer environment over the Internet for a fee. Customers will tap into IT resources and pay for them as easily as they now get their electricity or water [Rappa, 1]. According to Rappa, utilities are characterized by necessity, reliability, ease of use, fluctuating utilization patterns, and economies of scale [Rappa, 1]. Computer resources match this profile very well. Rather than being a strategic asset, IT has become a basic necessity for almost all businesses [Carr, 2]. If IT systems fail, this can result in unsatisfied customers and severe contractual penalties. Moreover, IT systems must be easy to use for customers and employees and must adapt and connect to heterogeneous systems. However, especially small- and medium-size companies will not constantly require massive amounts of computer resources. Instead, these resources will be required in the design phase of a new product only or to create daily/monthly/yearly reports, leading to fairly dynamic and unpredictable (from the provider s point of view) demand. The validity of the business model utility computing mainly stems from its immense economies of scale. Instead of having each company and research center to maintain its own costly computing and data centers, utility computing providers benefit from economies of scale and offer these resources on-demand. Setting up computing and data centers for the first customer takes tremendous fixed costs, but serving an incremental customer or request with these existing resources only requires (comparably) minor efforts. Sun s One-Dollar-Per-CPU-Hour and Amazon s Elastic Compute Cloud are prominent examples for the industry take-up of utility computing offerings [Sun, 3; Amazon, 4]. Rappa suggests to base pricing in utility computing on metering usage (also coined paywhat-you-use or pay-as-you-go ), as is the case with classic utilities such as water, telephone and Internet access [Rappa, 1]. This approach has also been adopted by Sun ($1/CPU-hour) and Amazon (e.g. $0.20 per GB of data transfer). However, even in this metered model, prices are temporarily static and do not fully reflect the dynamicity of demand and supply. Moreover, setting appropriate prices is a complex task for utility computing providers. The use of market mechanisms has often been suggested in situations in which allocations of and prices for scarce resources need to be established. The prices being paid reflect the resources scarcity. Consequently, dynamic market prices support on the one hand resource requesters in coordinating and distributing their demand over time, thus balancing the system load. On the other hand, utility computing providers have an incentive to offer resources in return for the payment of the market price. In dynamic, large-scale utility computing settings, however, the allocation problem of the market is computationally demanding. Market mechanisms must be solvable quickly. Currently, there is no market mechanism which can fully satisfy the economic requirements of utility computing markets. The contribution of this paper is the proposal as well as analytical and numerical evaluation of GreedEx, an exchange for clearing utility computing markets, based on a greedy heuristic, that does achieve a distinct trade-off: it obtains fast and near-optimal resource allocations while generating prices that are truthful on the demand-side and approximately 76

81 truthful on the supply-side. GreedEx may be used for two purposes: Utility computing providers may choose to run a proprietary GreedEx-based market platform to allocate their scarce computing resources to resource requests in an efficient manner and to dynamically price these requests. A GreedEx-based platform may be run as an intermediary market which aggregates demand as well as supply across multiple resource requesters and utility computing providers, thus increasing liquidity and efficiency. This paper is structured as follows. Section 1 presents related work on market-based scheduling. In Section 2, we elaborate the heuristic with its allocation and pricing scheme. The heuristic s speedup and its efficiency are evaluated numerically in Section 3 before Section 4 concludes the paper and points to promising future research directions. 1 Related Work There are two main streams of research in market-based allocation of utility resources: mechanisms which consider the trading of one type of resource only and mechanisms which account for dependencies between multiple utility resources. Spawn implements a market for computing resources in which each workstation auctions off idle computing time to multiple applications by means of a Vickrey auction [Waldspurger et al., 5]. All resources are allocated to at most one application at a time regardless of this application s actual demand which yields low resource utilization. Chun and Culler present a market for computing time in which one resource provider sells computing time to multiple requesters [Chun and Culler, 6]. Resource requesters get allotted computing time proportional to their share in the total reported valuation across all requesters. The Popcorn market is an online market for computing power which implements a Vickrey auction as well as two double auctions [Regev and Nisan, 7]. All of these approaches share two major drawbacks: First, they allow the specification and trading of computing power only. But requesters require a bundle of resources such as computing power, memory, and bandwidth. On the one hand, the approaches at hand thus lead to inefficient allocations since requests with the same demand for computing power but different memory requirements, for instance, are treated the same. On the other hand, requesters are exposed to the risk of only being able to obtain one leg of the bundle of required resources without the other ( exposure risk ). A second limitation of these approaches is that they do not support reservations of resources in advance which are essential for Quality of Service assertions [Foster, 8]. Schnizler et al. [Schnizler et al., 9] elaborate a multi-attribute combinatorial exchange (MACE) which targets these deficiencies. Users are allowed to request and offer arbitrary bundles of computer resources and can specify quality attributes on these resources. MACE implements an exact mechanism. The scheduling problem in this combinatorial setting is NP-hard, the pricing scheme of MACE yields approximately truthful prices. With truthful prices, strategic users do not have an incentive to report any valuation other than their true valuation. Due to the NP-hardness of the problem, the mechanism is adequate for batch applications the use for interactive applications is rather limited. The Bellagio system 77

82 also implements an exact combinatorial exchange for computing resources [AuYoung et al., 10]. Its pricing is based on the approximation of the truthful Vickrey-Clarke-Groves prices proposed by Parkes et al. [Parkes et al., 11]. Being an exact mechanism it shares the computational drawbacks of MACE. The work of Bapna et al. [Bapna et al., 12] is most relevant to the work presented in this paper. Multiple requesters and providers can trade both computing power and memory for a sequence of time slots. First, an exact mechanism is introduced. By introducing fairness constraints and imposing one common valuation across all resource providers, the search space in the underlying combinatorial allocation problem is structured so as to establish one common, truthful price per time slot for all accepted requests. Additionally, this introduces a linear ordering across all jobs and time which reduces the complexity of the allocation problem, which however still remains NP-hard. To mitigate this computational complexity, a fast, greedy heuristic is proposed at the expense of both truthfulness and efficiency. However, as shown in Lehmann et al. [Lehmann et al., 13] and Mu alem and Nisan [Mu alem and Nisan, 14], heuristics do not necessarily imply a loss of both truthfulness and efficiency. In their work, which is essential to this paper, Mu alem and Nisan establish a set of necessary and sufficient properties which can be leveraged to indeed design a truthful heuristic. 2 A Truthful Clearing Heuristic In this section we elaborate GreedEx, a market-based heuristic which obtains fast and near-optimal resource allocations while generating prices that are truthful on the demandside and approximately truthful on the supply-side of the utility computing market. To this end, we will first introduce the bidding language which defines the content of the messages being exchanged between the market mechanism and its participants and then we present the formal model of the underlying allocation problem. Secondly, we will elaborate an allocation algorithm which achieves a fast approximation of the optimal allocation schedule. Thirdly, pricing schemes will be introduced which translate these resource allocations into corresponding monetary transfers. 2.1 The Model There are two classes of participants in utility computing markets: requesters who would like to access computer resources such as processing power and memory, and utility computing providers who offer these resources in return for the market price. We introduce the term job to refer to a computational problem and we will call an atomic bundle of resources on which the job or parts of it can be computed a node, e.g. one server within a cluster. If a job gets allocated to a node, we will call this job and node a winning job and node respectively. The market mechanism is a centralized entity which clears periodically, meaning it collects resource requests and offers for a period of time before making its allocation and pricing decisions. Larger clearing periods generally allow for more efficient allocation decisions due to additional degrees of freedom when optimizing over a large number of requests and offers [Parkes et al., 11]. Note, however, that the clearing period can be made as small as 78

83 a couple of seconds in order to support interactive applications [Stoesser et al., 15]. The market mechanism is a sealed-bid mechanism in the sense that both resource requesters and providers do not get to know the other users requests and offers before market clearing [Parkes et al., 11]. A resource requester j who would like to submit a job to the utility computing system reports the job s characteristics (v j, c j, m j, s j, e j ) to the market mechanism where v j R + 0 denotes j s maximum willingness to pay (i.e. j s valuation) per unit of computing power and time slot, c j N and m j N the minimum required amount of computing power and memory respectively, and s j N and e j N specify the job s estimated runtime. In the following we will use the terms resource requester and job interchangeably. There is ongoing research about using historical data and information about jobs size and application in order to predict resource requirements and runtimes [Degermark et al., 16; Smith et al., 17]. Based on this research we assume users to be able to report estimates about these characteristics to the system. We require the market mechanism to make atomic allocations in the sense that each job can only be executed if there are sufficient resources available in all requested time slots. Furthermore, jobs can potentially be migrated between several nodes over time but each job can only be executed on one node at a time. A utility computing provider n who would like to contribute a node to the utility computing system reports the node s characteristics (r n, c n, m n, ɛ n, λ n ) to the market mechanism where r n R + 0 specifies this node s reservation price per unit of computing power and time slot, c j N and m j N the maximum amount of computing power and memory available on this node, and ɛ n N and λ n N the time frame during which the node can be accessed. Given sufficient resources, we assume that each node is able to virtually execute multiple jobs in parallel, for instance by using virtualization middleware. Example: Assume the resource requests and offers listed in Table 1 have been submitted to the system. Job J1 requests to be run in time slots 1 to 7 and requires a minimum of 54 Table 1: Sample resource requests and offers Job j v j c j m j s j e j Node n r n c n m n ɛ n λ n J N J N J J J J units of computing power and 126 units of memory in each time slot. J1 is willing to pay up to $12 per unit of computing power and time slot, that is $12 * 54 * 7 = $4,536 in total. Node N1 offers 84 units of computing power and 71 units of memory in time slots 2 to 10 and requires a reservation price of $4 per unit of computing power and time slot. In real settings and with respect to designing usable systems, human users cannot be expected to frequently submit requests and offers manually. Instead, software agents may serve so as to hide the system s complexity by automatically trading resources based on 79

84 the current resource consumption of applications and configurable bidding rules which automatically derive corresponding valuations [MacKie-Mason and Wellman, 18; Neumann et al. 19]. 2.2 Greedy Allocation Scheme Let J be the set of resource requests, N the set of resource offers, and T = {t N s j t e j, j J} {t N ɛ n t λ n, n N} the set of time slots across all requests and offers, i.e. the allocation problem s time horizon. Then the winner determination problem which solves the allocation problem exactly can be formalized as the following integer program: max V = c j x jnt (v j r n ) X j J t T n N s.t. x jnt {0, 1}, s j t e j, ɛ n t λ n, v j r n, j J, n N (C1) x jnt 1, s j t e j, j J (C2) n N x jnt c j c n, ɛ n t λ n, n N (C3) j J x jnt m j m n, ɛ n t λ n, n N (C4) j J e j u=s j n N x jnu = (e j s j + 1) n N x jnt, s j t e j, j J The objective of this integer program is to maximize welfare V, the total difference between the requesters valuations and the providers reservation prices across all time slots. Constraint (C1) introduces the binary decision variable x and ensures that a job can only be allocated to a node which is accessible during the right time slots and whose reservation price does not exceed the job s willingness to pay. Furthermore, a job can only be allocated to at most one node at a time (C2). Constraints (C3) and (C4) specify that the jobs allocated to one node at a time are not allowed to consume more resources than are available on this node. Constraint (C5) enforces atomicity, i.e. a job is either fully executed or it is not executed at all. Example: Figure 1 shows the schedule which results from optimally allocating the sample requests and offers above. The allocation problem at hand is an instance of a multi-dimensional Generalized Assignment Problem [Martello and Toth, 20] and as such clearly NP-hard. Heuristics have the desirable property of generating suboptimal allocations fast. We define the following greedy heuristic: (C5) 1. Sort jobs j J in non-ascending order of their reported willingness to pay v j. Sort nodes n N in non-descending order of their reported reservation prices r n. 2. Starting with job j with the highest ranking (i.e. the highest reported willingness to pay), allocate j to the nodes n 1,..., n k with the highest ranking (i.e. the lowest reported reservation prices) which can together accommodate j. 3. Repeat the allocation procedure of Step 2 with the next job in the ranking until there are no more jobs which can be allocated to the available nodes. 80

85 Figure 1: Sample allocation schedule This heuristic truly implements a greedy allocation scheme: it tries to greedily maximize the term v j r n in the objective function of the exact allocation problem above. Example: For the sample requests and offers at hand, the greedy heuristic is illustrated in Figure 2. Figure 2: The greedy heuristic applied to the sample requests and offers The winning jobs and nodes are highlighted. Job J4 can be allocated to node N1 since N1 offers sufficient resources over all required time slots and its reported reservation price is less than J1 s reported willingness to pay. In time slot 1, J3 can be allocated to node N1. However, in time slots 2 to 8, there is not sufficient residual capacity left due to the execution of J4. So J3 is subsequently allocated to the next available node N2. J1 cannot be executed at all due to its excessive memory requirements. The heuristic proceeds until the ranked list of jobs ends and finally happens to generate the same allocation schedule as the exact mechanism for the example at hand. 2.3 Pricing Schemes The allocation algorithm of market mechanism intends to achieve some global or social aim, in this case maximize social welfare. In achieving this goal, it depends on the resource requesters and providers to report their true valuations and resource characteristics. These participants, however, are assumed to be rational and self-interested agents trying to maximize their 81