Resource Constrained Multi-user Computation Partitioning for Interactive Mobile Cloud Applications

Transcription

1 Resource Constrained Multi-user Computation Partitioning for Interactive Mobile Cloud Applications Lei Yang, Jiannong Cao, and Hui Cheng Department of Computing, Hong Kong Polytechnic University, Hong Kong Department of CST, University of Bedfordshire, UK {csleiyang, Abstract Computation offloading is considered as a critical technique to improve the responsiveness of compute-intensive interactive applications in mobile cloud computing. By using the offloading technique, a fundamental problem is to partition the computations involved in the application between the mobile device and cloud. This paper consider, for the first time, the computation partitioning problem from the application provider s standpoint by taking into account the constraint on the cloud resources. This constraint is due to two practical facts: the resource leased by the application provider is not unlimited because of the operational cost, and the application on cloud are always faced with unpredictable number of users. We study the problem of Resource Constrained Multi-user Computation Problems (RCMCPP), which is to schedule the computations from multiple users onto their mobile devices and a given set of cloud resources, such that the average application delay for all the users is minimized. We formulate the problem using Mixed Integer Linear Programming (MILP), and show the problem is unique to and more difficult than existing job scheduling problems. We propose a set of heuristics for the problem, namely Greedy and γ-greedy. Through extensive evaluations on various algorithms, it is shown that our proposed Greedy heuristic outperforms the existing list scheduling algorithms by %, and our γ-greedy heuristic improved based on Greedy heuristic can run 27% faster, while has as good performance as Greedy heuristic. Index Terms mobile cloud computing; offloading; computation partitioning; job scheduling. I. INTRODUCTION The proliferation of sensors on smart phones enables a new type of mobile applications, named interactive perception application, such as object/gesture recognition, mobile biometric, health-care monitoring or diagnosis, mobile augmented reality and so on. These applications often need continuous sampling and processing of high rate sensors like accelerometers, GPS, microphones and cameras. The compute-intensive classification algorithms of these applications usually lead to unsatisfactory performance on the compute-capability limited mobile devices. To address the problem, computation offloading has been proposed by researchers in recent years. The basic idea of computation offloading is to shift the execution of some tasks from the mobile device to cloud infrastructures. The powerful processors on cloud can run the compute-intensive tasks faster. In spite of the continuous effort on computation offloading, some researchers argue whether computation offloading is able to accelerate the execution of the application [9]. They hold that the benefit of offloading depends on the tradeoff between the saved computational time and the delay arising in the data transmission between the mobile device and clouds. Therefore, some works [3]-[] have been done on the problem of partitioning computations between mobile side and cloud side. Given that the application is divisible and composed of a set of dependent tasks, the problem is to decide which tasks of the application is executed on the mobile device and the others is on the cloud, such that the cost is minimized. These works on computation partitioning problem differs in their cost models. They consider one of these factors such as energy consumption of the mobile device, application delay, and data transmission amount on the network, or the combination of these factors. However, these works mainly focus on the partitioning problem in a user independent model, in which the computations are partitioned for one single user without regard to the partitioning results of other users. In the model, it is assumed that the cloud always has enough resources to accommodate without delay the offloaded tasks, no matter how many other users offload the computations on the cloud. From the standpoint of the application provider, the assumption is not practical because of the two reasons. First, the application services are usually provided to an unpredictable number of users. Second, the application providers need to balance the number of resources leased from the cloud service providers and the quality of service they can provide, in order to lower their operational cost. Therefore, it is necessary to place the computation partitioning problem on up-bounded cloud resources. We name this problem as Resource Constrained Multi-user Computation Partitioning Problem (RCMCPP). RCMCPP are much more challenging then the existing computation partitioning problems. In RCMCPP, the users partitioning results are dependent with each other because of their competence for the cloud resources. For example, one user s decision on whether to offload the task not only depends on its saved computational cost and communication overhead, but also depends on how many other users offload the tasks into cloud. The number of users who offload the tasks onto cloud represents the load on the cloud. If the load is high, the time spent in waiting for available cloud resources may sacrifice the benefit of offloading. An optimal solution

2 for RCMCPP requires an unified schedule of all the users computations onto their mobile devices and cloud resources. In this paper, we study the RCMCPP for interactive perception application. The problem is to schedule the offloaded computations on a constrained number of cloud resources as well as to partition the computations between mobile side and cloud side for all the users, such that the average application delay is minimized. The selected performance metric is average application delay sine it is the most critical one for interactive applications. Moreover, using our algorithm we study how the overall application performance changes with the provisioned cloud resources and the load on the system, and thus construct a Performance-Resource-Load (PRL) model. The PRL model provides an optimal tradeoff between the application performance and the cost of cloud resources. We believe the model can provide guideline for application providers to achieve a cost-efficient utilization of the cloud resources, and hence save their operational cost. The main contributions of this work are as follows: To the best of our knowledge, this work is the first one to study the computation partitioning problem from the standpoint of application providers. Since the problem is based on a constrained number of cloud resources, the solutions of the problem could improve the application performances when faced with unpredictable number of users, and save the operational cost of the application provider. We show that our RCMCPP are different from and more difficult than existing job scheduling problems, such as Task Scheduling Problem in Heterogeneous Computing (TSPHC) and Hybrid Flow Shop (HFS) scheduling problems. We systematically solve the RCMCPP by proposing a set of efficient heuristics, namely Greedy, and γ-greedy. Through the evaluation of the heuristics, we demonstrate that our proposed Greedy and γ-greedy heuristics have better performance than existing list scheduling algorithms by percent. The remainder of the paper is organized as follows. In Section II, we present the system model, including the applications and system architecture. In Section III, we define and formulate the RCMCPP, and also discuss its uniqueness by comparing with existing job scheduling problems. We present a set of heuristics for RCMCPP in Section VI, and evaluate the performance of these algorithms in Section V. Section VI gives a brief discuss on the related works, and Section VII concludes the paper. II. RELATED WORKS In this section, we briefly present the related works on mobile cloud computing and computation partitioning. Other related works on job scheduling problems are discussed in section IV. The application of cloud services in the mobile ecosystem enables a newly emerging mobile computing paradigm, namely Mobile Cloud Computing (MCC). We have classified three MCC approaches in our previous work [3]: a) extending the access to cloud services to mobile devices; b) enabling mobile devices to work collaboratively as cloud resource providers [][2]; c) augmenting the execution of mobile applications using cloud resources, e.g. by offloading selected computing tasks required by applications on mobile devices to the cloud. This will allow the mobile developer to create applications that far exceed traditional mobile device s processing capabilities. Most of the research works in mobile cloud are using the second MCC approach [3]-[]. They focus on the computation partitioning problem. In the works of [][], the computations are partitioned with a local view. The offloading decision for each task/module is made without the global view of the other tasks/modules involved in the application. For each task/module, if the reduced processing cost on the cloud is larger than the increased cost induced by data transmission across the network, then it is offloaded onto the cloud. [] aims to save the energy consumption, while [] aims to maximize the execution time and throughput. In the works of [3]-[7], the computations are partitioned with a globe view. The offloading decisions for each task/module are dependent with each other, and are jointly made with the the profiling information about all the tasks/modules included in the application. [4] demonstrates that the approach of offloading with globe view outperforms the approach with local view. The computation partitioning problem in this category is modeled as an optimization problem. They differs in the application model or the corresponding optimization objective. In [4][5], the application/program are model as a method graph/tree, where each vertex represents a method and weighted with its computational and energy cost, and each edge represents method calling and weighted with the size of data to transfer remotely. The optimization problem is solved using Integer Linear Programming. [6][7] focus on the applications which are composed a set of services, and aims to optimize one of factors such as latency, data transmission amount, and energy consumption, or optimize a customized combination of the factors above. [3] focuses on the partitioning of data stream application, and use a dataflow graph to model the application. The genetic algorithm is used to maximize the throughput of the application. We note that all the works [3]-[] on computation partitioning problem is solved from the standpoint of one single end user, with the assumption of unlimit resources in cloud side. Few existing works have examined how to partition the computations when the number of users scales up, from the standpoint of application provider, with an aim to save the operational cost while maximize the application performances experienced by end users. In this paper, we will study the multi-user computation partitioning problem with the consideration of the cloud resource constraint.

3 Input Image SIFT Grayscale Keypoint Localization Descriptor Generation Classification Output Result Mobile Users Application Provider Online Offline Analyzer PRL Recordings P L L2 PRL Model R Cloud IaaS Provider Local Extrama Detection Orientation Assignment Similarity Caculation... PSC RPC Fig.. An example of image matching III. SYSTEM MODEL A. Interactive Mobile Cloud Applications Our system targets for the interactive mobile cloud applications. These applications take intensive data as input, perform a sequence of operations onto the data, and then output the results. The input data are usually sampled from the sensors on the mobile device. Mobile augmented reality is considered as one killer application. The application uses the camera and/or other sensors to percept the user s environment/scene, and then augment the original scene with relevant information in the display. The perception is done frequently which is driven by the user s input. The core part of augmented reality applications is the image based object recognition. Fig. shows the operations involved in the whole process of imaged based object recognition. Note that the SIFT algorithm is used to extract the features [2]. In our works, the applications are modeled as a sequence of processing modules similar to Fig.. The module represents a kind of operation onto the data. The directed edge represents the dependence between the modules. It means that the module can not start to run until the its precedent module completes. Each module is allowed to run either locally on the mobile device or remotely on the cloud. Also the input data of the application are supposed to be from the sensors of the mobile device, and the output data should be delivered back to the mobile device. The performance metric that the user is concerning on is the execution time of the application. As the execution time represents the responsive delay for the interactive application, we simply use the term delay in the paper. The delay is the summation of the computational time of all the modules and the data transmission time between the modules. B. System Architecture The system contains three roles: mobile users, application provider and cloud resource provider. The mobile users run the interactive application on their devices. The execution of the functional modules is triggered repeatedly through the user interface. Some modules can be offloaded onto clouds in order to shorten the execution time. However, it may incur extra communication time due to the data transmission between the mobile device and clouds. The application provider leases the computing resources (in term of cloud servers/vms) from the cloud IaaS provider to accommodate the offloaded modules. Obviously the application performance experienced by the mobile user depends on how much resources are leased from IaaS providers. From the perspective of application provider, Resources Analyzer Load-time Recordings Fig. 2. Hours Load-time Pattern System Architecture the question here is how to save the cost of leased resources while providing as good performance for all the mobile users as possible. This is the goal that we want to achieve in the design of our system. Fig. shows the system architecture. It is composed of three critical components: the Partitioning and Scheduling Component(PSC), the Resource Provisioning Component(RPC), and the Analyzer. The PSC is responsible of scheduling the modules onto the mobile devices and cloud servers. The Analyzer is to learn the Utility-Resource-Load (URL) model and historical load pattern. According these information from Analyzer, the RPC determines how much resources to lease from the IaaS provider. We will describe them in detail as follows. ) PSA: Each time the user initiates the application on its own device, a request with its profiling parameters such as device s capability and wireless bandwidth is then sent to the cloud. Consider the case that the system serves a large number of mobile users, it is possible that a lot of users request come concurrently at the same time. The PSC in cloud makes the decision for the users which modules are executed at the mobile side and which others are executed at the cloud side, and also generate an optimal schedule for the cloud servers/vms to execute the offloaded modules from the users. The objective of the PSC is to minimize the average delay of the concurrent requests. Since the partitioning decisions is placed on a constrained number of cloud servers leased by application provider, we call the problem as Resource Constrained Multi-user Computation Partitioning Problem (RCMCPP). We will formulate this problem, show how it is unique to existing scheduling problem, and solve this problem in Section IV. 2) Analyzer: Once this scheduling mechanism is determined, the overall application delay depends on the number of leased cloud servers and the number of mobile users concurrent requests loading on the system. The latter parameter is named as load in our paper. For a given load, the more the resources are provisioned, the better the application delay will be. To achieve an optimal tradeoff between the resource cost and the application delay, we need to acquire the Utility-Resource-Load (URL) model, where the utility indicates the normalized average application delay. The URL model quantifies how the utility varies depending on the provisioned resource and the load on the system. Once the PSC

4 outputs an optimal execution schedule, it also stores the current utility, number of leased resource and load at the database. We name it as a URL recording. The analyzer does two things: a) to simulate more URL recordings using the same scheduling algorithm with PCA; b) to fit the URL curve according to both the actual and simulated URL recordings; c) to extract the historical load pattern. 3) RPC: Before the design of resource provisioning mechanism, we describe two practical issues that we face. First, we need to be aware of the gap between the load period and resource provisioning period. Load period, denoted as δ, means that the load on the system changes every δ period. Generally δ is up to several seconds. The resource provisioning period, denoted as T, is the minimum temporal interval that the leased resources are allowed to increase/decrease. In another word, resources leased by application provider can not be changed until the next resource provisioning period comes. Generally to save the cost, T is equal to the minimum billing period according to the IaaS provider s pricing model. For example, in Amazon EC2 instances market, T is one hour. Note that T >> δ. During one T, the load usually changes at random. If we lease the resources according to the peak load during that provisioning period, some paid resources will be idle at most time, so it is a waster of resources. The RPC is to determine how many cloud servers to lease at each time T. The decision is made according to the URL model and historical load pattern, with an aim to realize a cost-effective utilization of the cloud resources. In the following, we are focusing on the RCMCPP involved in the design of PRC, which is the most critical and challenging component in our system. For the design of Analyzer and RPC, we are not going to describe in details since the solutions are not different with existing works [3]. A. Problem Description IV. PROBLEM STATEMENT We first describe the Single user Computation Partitioning Problem(SCPP), where one single user runs the application and request the cloud for computation offloading. Suppose the application contains of a sequence of n modules. Each module can be executed either on the mobile side or on the cloud side. The execution time of module j is c j if it is offloaded onto cloud ( j n); otherwise, if it is w j, where w j > c j. If two adjacent modules j and j + run on different sides, the data transmission time is w j,j+ ; otherwise the data transmission time between j and j + becomes zero when they run on the same side. To model that the input/output data of the application should be from/to the mobile device, we add two virtual modules and n + as the entry and exit module. Definition Single user Computation Partitioning Problem(SCPP): Given the computation cost c j and w j ( j n), and communication cost w j,j+ ( j n), the SCPP is to determine which modules should be offloaded onto cloud such that the application delay is minimized. It is formulated by min x j d = n [( x j )w j + x j c j ] + i= n x j x j+ w j,j+, j= () where x j = if the module j is offloaded onto cloud, and x = x n + =. We now consider a more general case when the number of users scales up. Since the resources(servers) at cloud side are constrained, the users compete for the resource on the cloud if they offload the modules concurrently. The benefit of offloading could be sacrificed by the delayed execution in cloud. We need to consider the two things jointly at the same time: )how to partition the application for each user, and 2) how to schedule the offloaded computations on the servers, such that the average application delay for all the users is minimized. Assume that λ users launch the same application simultaneously on their own devices and there are r servers on the cloud side. As mentioned in the SCPP, c j represents remote computation cost of the module j. We assume that the mobile devices processing capability and networking bandwidth are different with each other. We use a λ n matrix W to represent local computation cost in which each w i,j gives the execution time to complete the module j on the i-th user s mobile device. Π is a λ (n + ) communication cost matrix in which each π i,j ( j n) represents the the data transmission time from the module j to j + for the user i. The RCMCP problem is defined as follows. Definition 2 Resource Constrained Multi-user Computation Partitioning Problem(RCMCPP): Assuming λ users concurrently request for the computation offloading, given the remote computation cost c j, the local computation cost matrix W λ n, the communication cost matrix Π λ (n+) and the number of servers r at the cloud side, the problem is to determine for all the users at which side each module is executed, and on which cloud server and when the offloaded module is executed at the cloud side, such that the average application delay of the users is minimized. B. MILP Formulation for RCMCPP We formulate RCMCPP as a Mixed Integer Linear Programming(MILP) problem, which can help us better understand the problem and also seek an optimal solution by using MILP techniques. For the convenience of description, we use (i, j) to represent the j-th module for the user i. In this formulation, we define one continuous variable t i,j and three - discrete variables x i,j,k, y i,j, z i,i,j,j as follows. t i,j represents the time that the module (i, j) is finished. x i,j,k = if the module (i, j) is executed on the machine k, and otherwise x i,j,k =. Note that k =, 2,..., r represents the servers at cloud side, and k = represents the mobile device. y i,j = if the two dependent modules, (i, j) and (i, j + ), are executed on different sides, and otherwise y i,j = if the two modules are on the same side; z i,i,j,j = if the execution of module (i, j) precede the module (i, j ), and otherwise z i,i,j,j =.

5 We also define a positive constant IN which is as great as infinity. The MILP formulation can be written by: Min λ λ t i,n+ ; i= Subject to: r (a) x i,j,k =, i [, λ], j [, n + ]; k= (b) t i,j+ t i,j + π i,j y i,j + w i,j+ x i,j+, +c j+ ( x i,j+, ), i [, λ], j [, n]; (c) t i,j t i,j + c j IN ( z i,i,j,j ) IN (2 x i,j,k x i,j,k), i, j, (i, j) (i, j ), k [, r]; (d) y i,j x i,j, x i,j+,, i [, λ], j [, n]; (e) y i,j x i,j+, x i,j,, i [, λ], j [, n]; (f) y i,j x i,j, + x i,j+,, i [, λ], j [, n]; (g) y i,j 2 x i,j, x i,j+,, i [, λ], j [, n]; (h) z i,i,j,j > (t i,j t i,j)/in, i, j, (i, j) (i, j ); (i) z i,i,j,j + (t i,j t i,j)/in, i, j, (i, j) (i, j ); (j) x i,j,k, y i,j, z i,j,i,j {, }, t i,j, i, j, k, i, j (k) x i,, =, x i,n+, =, t i, =, i [, λ]. (2) Constraint (b) guarantee the temporal order for the execution of two dependent modules. Constraint (c) indicates that each cloud server can and only can process one module at one time. In another word, if two modules are scheduled to the same machine, one module will not be started until the other one is finished. Since y i,j and z i,i,j,j are two auxiliary variables, constraints (d)-(g) show that variables y i,j is determined by x i,j,k, and constraints (h)-(i) indicates the value of z i,i,j,j depends on the value of t i,j and and t i,j. C. Uniqueness of RCMCPP We note that intuitively our RCMCP problem seems the same with the job scheduling problem in parallel computing. Hence, in the subsection we compare the RCMCP problem with classical job scheduling problems and discuss their differences. Comparison with TSPHC. The first classical scheduling problem similar to RCMCPP is Tasks Scheduling Problem for Heterogeneous Computing(TSPHC). In this problem, an application is represented by a directed acyclic graph(dag) in which nodes represent application tasks and edges represent intertask data dependencies. Given a heterogeneous machine environment, where the machines have different processing speed, and the data transfer rate between machines are different, the objective of the problem is to map tasks onto the machines and order their executions so that task-precedence requirements are satisfied and a minimum completion time is obtained. The TSPHC is NP-complete in general case, and various efficient heuristics were proposed in the literature [4]. Intuitively, we may model our problem as similar to RCM- CPP as possible. In our problem, we have λ n tasks, where the precedence dependence exists among the tasks from the same users. The machines can be abstracted as a set of r parallel VMs and one mobile device. Note that the sole mobile device in our model, unlike the machines in TSPHC, is able to execute more than one task simultaneously. The data transfer rate is infinite between the cloud VMs, while being constrained between any pair of the mobile device and cloud VM. The problem is to map the tasks onto the (r + ) machines such that the precedence constraints are satisfied, and the weighted summation of all the tasks completion time is minimized. The tasks that appears in the last position of the application flow are assigned the weight of one, and others are assigned the weight of zero. It is worth noting that the key difference between RCMCPP and TSPHC is the optimization objectives. In TSPHC the optimizing objective is the makespan which is the maximum completion time of all the tasks, while in RCMCPP the objective is the total weighted completion time. Although various efficient heuristics were proposed for TSPHC to optimize the makespan, there were few solutions on optimizing the total weighted completion time. We can only find some early efforts to minimize the total weighted completion time on single machine or on parallel machine without considering the communications. Even these simplified versions for RCMCPP have been proved to be NP-hard [5]. Comparison with HFS. The second classical scheduling problem is Hybrid Flow Shop (HFS) scheduling [6]. In this problem, the job is divided into a series of stages. There are a number of identical machines in parallel at each stage. Each job has to be processed first at Stage, then Stage 2, and so on. At each stage, the job requires processing on only one machine and any machine can do. Assuming all the job are released at the beginning, the problem is to find a schedule to minimize the makespan. We note that the application and its functional modules in our problem are analogous to a job and stages in HFS. The mobile devices and cloud VMs may be modeled as the machines in HFS. However, our RCMCPP is far different from HFS in terms of the following aspects. ) In RCMCPP, there exists communication overhead between stages, which makes the problem more complex than HFS; 2) in RCMCPP, since both cloud VM and mobile device are able to execute any module of the application, the set of machines are not partitioned into subsets according to the stages; 3) the objective in RCMCPP is the total completion time rather than the makespan. A. LP based Solution V. SOLUTIONS TO RCMCPP The most widely used method for solving MILP problems is branch and bound [7]. It transfers the MILP problem into standard Linear Programming (LP) problem by relaxing the integral variables. Based on the optimal solution obtained using LP, the MILP problem is then divided into subproblems by restricting the range of the integral variables. The

6 subproblem is solved using LP, and then divided into subsubproblems. This process is done recursively until a feasible and satisfactory solution is found. Unfortunately the LP-based solution is not practical for the RCMCPP, because it contains an exploding number of variables and constraints when the problem scales up. From equation (2), we can see that the number of variables z i,i,j,j achieves the magnitude of λ 2 n 2, and the number of constraints (c) is λ 2 n 2 r. Although we can express the model by deleting all the auxiliary variables z i,i,j,j and y i,j, the number of variables x i,j,k remains a large magnitude of λnr. Moreover, the model is not linear in this case. Therefore, it motivates us to seek heuristic algorithms. B. Greedy Heuristic Algorithm In this subsection, we describe a greedy heuristic algorithm for the RCMCPP. The idea is that we first relax the resources constraints in the RCMCPP. For each user, we can have an optimal partitioning by using the solution of SCPP. Under these optimal partitions, we search the time intervals during which the resources constraints are violated. We then adjust the schedule in a greedy way that can release as long resources occupation period as possible at these time intervals, and meanwhile increase the average application delay as little as possible. The searching and adjusting are done alternatively until the resource constraints are satisfied for all the time. The algorithm is designed due to the observation that the SCPP initiated solution is optimal but not feasible in the solution space of RCMCPP. Hence, we can adjust the initial solution iteratively to make it feasible, while not sacrificing the objective function (average application delay) too much. Data Structure. Before describing the algorithm, we first introduce three important data structures as below. Execution Schedule S: For each user i, we can create a n 3 table to store its execution schedule in which each row is a three-tuple (x i,j,, τ i,j, t i,j ), where as described in Section III(c) x i,j, indicates at which side the module (i, j) is scheduled, and τ i,j, t i,j are respectively the start time and completion time of the module (i, j). Cloud Resource Occupation List L cro : The list records the number of occupied servers at each time interval. Each element e k of the list is denoted as (start, end, num), where start/end represents the start/end point of the time interval, and num is the number of occupied server at the interval. The time intervals in the list have no overlapping with each other, and are able to constitute a continuous time interval. The elements are stored in the list according to the ascending order of the time intervals. Thus, we have e k.end = e k+.start and e k.start < e k+.start, k. Note that the length of each interval is not necessary to be the same. Module Adjustment List L adj : The list records the modules which could release the cloud resource occupation period by waiting to execute later on or changing to run at mobile side, their rewards, and corresponding released cloud resource occupation period for the adjustment. We denote each item as (i, j, reward, D rel ), where i, j indicates the module, reward represents the reward of the adjustment on this module and D rel is the released cloud resource occupation period. The modules are stored in the list by a descending order of reward. Algorithm gives the pseudo code of the algorithm. First, we get the optimal partitioning and corresponding execution schedule for each user without considering the cloud resource constraint (line ). Second, we compute the cloud resource occupation list L cro (line 2). It records the number of the occupied/in-use servers at each time interval. Then, we find the earliest interval that the number of occupied servers exceeds the up-bound r (line 3). We name the start of the interval as critical point t cri on the time axis, as before it the resource constraint is satisfied, and after it the constraint is violated. Next, for each user we look for the module that was scheduled onto the cloud side, and the execution time of which spans the critical point. In order to release the cloud resource immediately after the critical point, we adjust the schedule of this module by moving it back to mobile side or delaying its execution for some time. The adjustments are scored based on a reward function (which represents the greedy strategy in our algorithm) (line 6). The modules with positive score/reward are added into the module adjustment list L adj (line 7-9). After the searching for all the users, α modules with largest rewards are selected from the list L adj to adjust (line 2). In each iteration, the critical point t cri would be moved forward along the time axis. The algorithm stops until the resource constraint is satisfied at all the times (line 3). Input : A set of λ users, and a set of r cloud servers Output: The execution schedule (x i,j,, τ i,j, t i,j) Compute the initial execution schedule using SCPP solution; 2 Compute the cloud resource occupation list L cro ; 3 while find the critical point from L cro do 4 for each user do 5 if find the module that is scheduled onto cloud, and its execution time cover the critical point then 6 Compute the reward of adjusting the module; 7 if Reward > then 8 Insert this module into list L adj by a descending order of its reward; 9 end end end 2 Select the first α modules from list L adj to adjust; 3 Update the execution schedule of the selected modules; 4 Re-compute the cloud resource occupation list L cro; 5 end 6 return the execution schedule for all the users; Algorithm : The SCPP based Greedy Heuristic In Algorithm, the execution schedules indicate if each module is executed at mobile side or at cloud side. However, for the modules that are allocated to cloud side, the results do not specify which cloud server hosts the offloaded module. We may question: could the completion time of each module t i,j be delayed when allocating the offloaded modules onto

7 the cloud servers? Theorem For the execution schedule S = {(x i,j,, τ i,j, t i,j )} generated by Algorithm, we can always find a feasible schedule S = {(x i,j,k, τ i,j, t i,j )} of RCMCPP by assigning the offloaded modules onto the cloud servers, such that each module (i, j) is completed no later than t i,j, t i,j t i,j. Proof: Consider a simple case where the offloaded modules are scheduled online to the cloud servers with the policy of first-come-first-serve. τ i,j can be taken as the time that the module (i, j) comes to the cloud. We can prove using mathematical induction, by the first-come-first-serve policy every offloaded module (i, j) can start exactly at time τ i,j and be completed by the time t i,j on the cloud servers. For the new schedule S, we have τ i,j = τ i,j, and t i,j = t i,j, The offloaded modules are ordered according to their arriving time τ i,j. For the -st module which comes to the cloud earliest, obviously it is able to start at its arriving time τ i,j. For the s-th module which comes to the cloud in s-earliest time, we can prove if -st, 2-nd,..., (s )-th modules start to run at their arriving time, then the s-th module can also be executed at the cloud servers at its arriving time τ is,j s. The proof is as follows. Up to the time when the s-th module comes, we can conclude: ) at least one cloud server is idle at τ i,j ; 2) if the cloud server is idle at time τ is,j s, it must be idle all the time [τ is,j s, + ). The first conclusion is due to the fact, that our algorithm guarantees the number of occupied cloud servers does not exceed the constraint r at all the times if each module is executed according to the schedule S. The second conclusion is proved in this way: if 2) does not hold, which means that up to the time τ is,j s, some module which arrives at the cloud earlier than the s-th module has already been allocated to the cloud server. It contradicts with the firstcome-first-serve policy. Because of the two conclusions, the s-th module can start on the cloud server at its arriving time. If multiple idle servers exist, we randomly allocate the module to one of them. Algorithm Details. In the following, we present details of Algorithm : ) how to obtain the initial optimal but infeasible solution; 2) how to compute the cloud resource occupation list L cro ; 3) the reward function; 4) the number of modules α to adjust in each iteration. ) Initial Solution: Consider the SCPP shown in Definition, suppose that the completion time of module j is denoted as t j. As every module can be completed either on mobile side or on cloud side, we use notation t (c) j to represent the completion time of module j if it is scheduled on the cloud side. Correspondingly, notation t (m) j is the completion time of module j if scheduled on the mobile side. Then, we have a recursive formulation of t j : t (c) j t (m) j = min{t (c) j + c j, t (m) j + w j + π j,j }, (3) = min{t (m) j + w j, t (c) j + c j + π j,j }, (4) Cloud Mobile {c, c 2, c 3, c 4 } = {.,.2,.2,.}; {w, w 2, w 3, w 4 } = {.4,.8,.8,.4}; {π,, π,2, π 2,3, π 3,4, π 4,5 }={.,.6,.5,.5,.4}. Fig. 3. SCPP example where j =, 2,..., n, n +. The module and module n + are respectively the entry and exist module we have added virtually into the application graph. The computation time of these two modules are zero. In order to determine the optimal partitioning, we construct a graph which contains 2 (n + ) nodes. Each node is denoted as v (p) j, and labeled with its completion time t (p) j, where j n + and p {c, m}. Since the input data of the application is from the mobile device, we let t (m) = and t (c) =. Starting from the nodes v (m) and v (c), and we can recursively compute the labels of all the nodes by equation (3)(4). For each node, for example, v (m) j, there are two possible edges from its precedent nodes to it, v (c) v (m) j j and. The edge that leads to less value of t(m) j according to equation (4) is added into the graph. The partitioning result is actually a path from node v (m) to node v (m) n+. Fig.3 shows an example of the method. There are four modules in the application. The colored nodes indicate the places that the modules are scheduled to. 2) Computation of Cloud Resources Occupation List and Critical Point: The cloud resource occupation list L cro records the number of occupied server at each time interval. In each iteration of Algorithm, L cro needs to be re-computed. We design an algorithm to calculate L cro which is shown in Algorithm 2. The input of the algorithm is the execution schedule of each user {(x i,j,, τ i,j, t i,j ) j n, i λ}. In Algorithm 2, L cro is first initialized by the time interval (, ), with the number of occupied server at this interval being zero. For each module (i, j) that is allocated onto cloud side (line 4), we first respectively search the intervals from L cro in which the start and completion time of the module s execution period are located (line 5-6). The interval covering the start or the completion point is split into new sub-interval, which are then inserted into L cro (line 2,5,2). For the interval which is entirely covered by modules (i, j) s execution period, we increase the number of its occupied server by one (line 9). The algorithm stop until all the modules on cloud are finished. After finishing one module, the length of L cro increases at most by 2. The length of L cro returned by Algorithm 2 would be at most 2λn. The time complexity of the algorithm is on the order of O(λ). Critical point t cri is defined as the start time of the first time interval in L cro during which the number of occupied servers exceed the up-bound r. Based on the L cro, it is easy to find the critical point t cri. The whole time axis are divided into two periods by t cri. In the period before the time t cri, the number of occupied servers is lower then the up-bound r, while during

8 Input : {(x i,j,, τ i,j, t i,j) j n, i λ} Output: L cro L cro {(,, )}; 2 for each user i do 3 for each module j do 4 if x i,j, == then 5 Search the element e s in L cro such that e s.start < τ i,j e s.end; 6 Search the element e v such that e v.start t i,j < e v.end; 7 if s < v then 8 for k-th element in L cro, s < k < v do 9 e k.num e k.num + ; end if τ i,j e s.end then 2 Insert element (τ i,j, e s.end, e s.num + ) into L cro after e s; 3 end 4 if t i,j e v.start then 5 Insert element (e v.start, t i,j, e v.num + ) into L cro before e v; 6 end 7 e s.end τ i,j; 8 e v.start t i,j; 9 end 2 else 2 Insert elements (τ i,j, t i,j, e s.num + ) and (t i,j, e s.end, e s.num) into L cro after e s; 22 e s.end τ i,j; 23 end 24 end 25 end 26 end 27 return L cro; Algorithm 2: The Algorithm for Computation of Cloud Resource Occupation List the period after L cro the number of occupied servers exceed the up-bound r. In Algorithm, t cri is put forward on the time axis in each iteration until no t cri is found from L cro. 3) Reward Function/Greedy Strategy: The reward function is to evaluate the reward of adjusting the schedule of one module. It is defined by the released cloud resource occupation period, denoted as D rel, minus the extra delay caused by this adjustment, denoted as D delay, Reward = D rel D delay. (5) The reward function is defined due to the motivation that we always prefer to select the module to adjust which can release as a long cloud resources occupation period as possible, and meanwhile causing as short extra delay as possible. Next we describe the definition of D rel and D delay. Released Cloud Resource Occupation Period. Note that only the modules satisfying the following two conditions could be adjusted: (a) the module is executed on the cloud side, x i,j, = ; and (b) its execution duration covers the critical point, τ i,j < t cri < t i,j. For one user i, assuming the module j is the candidate module which cloud be moved to mobile side. To distinguish with the original execution schedule of user i, we use τ i,j and t i,j to respectively represent the start time and completion time after the movement of module j. The released cloud resource occupation period due to the adjustment of module (i, j ) is defined by D rel (i, j ) = min{τ i,j c, t i,jm } t cri (6) where j c, j m [j +, n + ]. j c represents the first successor of module j that is scheduled to the cloud side; if no successor of module j is on cloud side, then τ i,j c =. j m is the first successor of module j that is scheduled onto the mobile side; if no successor of module j is on mobile side, then j m = n +. Extra Delay. The extra delay caused by the adjustment of module (i, j ) is defined by D delay (i, j ) = τ i,j + τ i,j+ (7) The movement of module j usually could delay the execution of the successors of module j. In most cases, D delay is greater than zero. For example, in the first iteration of our algorithm, the movement of any module leads to positive D delay. It is because any adjustment of initial solution will increase the execution time. However, in the iterations after the first one, it is possible that the adjustment of some module shortens the application execution time. In this case, the D delay of this module is less than zero. The module is high likely to be selected to adjust. The reward of adjustment of one module could be positive or negative. Fig.4 (a) indicates that D rel > D delay, hence the reward of the adjustment is positive; while in Fig.(b) the reward of the adjustment is negative. In each iteration of algorithm, we only select the modules with positive and as large as possible reward to adjust. Remember that we actually have two adjusting options to release the resources which have been occupied at the critical point, waiting and movement. Waiting means by purely delay the execution of the modules on cloud side, while movement means by changing the execution place of the module. However, as shown in Fig.4(c), we can see that waiting adjustment always has a non-positive reward. Hence, in our algorithm we do not consider the waiting adjustment. Other Reward Functions. Now we pose another question: do we have other reward functions? Actually there exist two typical functions: (a) Reward = D delay, and (b) Reward = D rel. The former function means that the modules with as small extra delay as possible are selected despite of its released cloud resource occupation duration. The latter function prefers to select the module that could release longer cloud resource occupation duration. We evaluate the two functions in Section IV. We find that function (a) obtains good average application delay, but requires a long time to converge, while function (b) leads to bad average application delay. The reward function in equal ( ) is able to achieve good average delay and fast convergence speed.

9 D rel j- j j+ D delay j- j j+ (a) Movement: D rel > D delay D rel j- j j+ D delay j- j j+ (b) Movement: D rel < D delay D rel j- j j+ D delay j- Delay j j+ (c) Waiting Fig. 4. j+2 j+2 j+2 Reward function j+2 j+2 Exec. on Cloud Exec. on Mobile Data Trans. 4) Number of modules to adjust α: Now we answer the question: how many modules are adjusted in each iteration? Note that in Algorithm, only the modules with positive reward are put into the L adj. First, we get average D rel of all the modules in L adj, which is denoted as D (avg) rel. Then, from the cloud resource occupation list L cro, we compute the average number of occupied servers in the period from t cri to t cri + D (avg) rel, which is denoted as num (avg). The number of modules to adjust α is given by Equation(8 ): α = min{lengthof(l adj ), num (avg) r}, (8) (e.end e.start) e.num num (avg) e L = (sub) cro e Sub{L (e.end e.start), (9) cro} where L (sub) cro is the subset of L cro, which includes all the the time intervals located at [t cri, t cri + D (avg) L (sub) cro rel ], = {e L cro e.start, e.end [t cri, t cri + D (avg) rel ]}. () In our algorithm, the adjustment on the schedule usually leads to the decreasing of application performance. We avoid the case that excessive modules are moved back to mobile side, such that the cloud servers are not utilized completely. So α is constrained by an up-bound num (avg) r as shown in Equation (8). Theorem 2 The complexity of Algorithm is O(λ 2 ). Proof: In Algorithms, the evaluation of the reward for adjusting each offloaded module is the most time costly operation. In each iteration, at most λ n modules need to be evaluated. The question is how many iteration Algorithm needs to stop. The worst case is that the resources constraints r =, and only one module is moved to the mobile side in each iteration. In this case, Algorithm needs at most λ n iterations, such that the offloaded modules are all moved to the mobile side. By neglecting the constant n, the worst time complexity of Algorithm is O(λ 2 ). C. List Scheduling (LS) based Algorithms Although RCMCPP is different from traditional job scheduling problems in parallel computing, we are still interested knowing how the heuristic algorithms for these problems will j+2 Input : The execution schedule (x i,j,, τ i,j, t i,j) Output: The execution schedule x i,j,k, τ i,j, t i,j for each user i do 2 Insert the first offloaded module of the user into a scheduling list L s; 3 end 4 Sort the modules in L s by non-increasing order of their ready time τ i,j; 5 while there are unscheduled tasks in the list L s do 6 Select the first module (i, j ) from the list for scheduling; 7 for each machine k, including cloud servers and the user s mobile device do 8 Compute the extra delay of module (i, j ) on machine k. // If k =, the extra delay is obtained by Equation(7), otherwise the extra delay is calculated by the earliest time that the module can actually start on the cloud server minus the module s ready time τ i,j; 9 end Assign task (i, j ) to the machine that minimizes the extra delay; Update the execution schedule for user i, (x i,j,k, τ i,j, t i,j); 2 Remove the module (i, j ) from L s, and add the first successive offloaded module of (i, j ) into L s; 3 end 4 return the execution schedule x i,j,k, τ i,j, t i,j; Algorithm 3: The MEDLS algorithm perform when used to solve the RCMCPP. The widely used method to solve the job scheduling problem is list scheduling. In list scheduling algorithms, the jobs/tasks are assigned a priority and inserted into an ordered scheduling list. The task/job are selected from the list in the order of their priorities, and each selected task is scheduled to a processor which minimize a predefined cost function. We have two ways to solve the RCMCPP using list scheduling method. One way is, as describe in Section IV(C), we can fit the RCMCPP into the TSPHC model by abstracting both the mobile devices and cloud servers as the processors. Heterogeneous-Earliest-Finish-Time (HEFT) algorithm is considered an efficient solution [4]. In HEFT, the edges between connective modules are labeled with the average communication time, and the module are labeled with the average computation time. The task is ordered according to the length of the path to the exit module. The selected task is scheduled to the processor which can finish the task in earliest time. The time complexity of this algorithm is on the order of O(λ r). We will evaluate the performance of HEFT algorithm when it is used to solve the RCMCPP. The other way is that we divide the RCMCPP into two phases. In the first phase, named as partitioning, we simply decide for each user which modules are executed on mobile side and which others are scheduled on cloud side. The partitioning phase is done using the SCPP method, which is introduced in last subsection. In the second phase, the list scheduling method is applied to allocate the offloaded modules onto the cloud servers or to move the offloaded modules to the

10 mobile side. Based on the generated execution schedule from the first phase, the task that is ready to start earliest is assigned with the highest priority. Each selected task will be scheduled to the machine (including the cloud servers and the mobile device) which leads to a minimum extra delay. We name the method as Minimum Extra Delay List Scheduling (MEDLS). Algorithm 3 gives the pseudo code of the MEDLS. The time complexity of the algorithm is also O(λ r). D. γ-greedy Algorithm In the greedy heuristics (Algorithm ), by an iterative adjustment on the SCPP initialized execution schedule, the number of occupied resources et each time does not exceed the resources constraint r. In MEDLS algorithm (Algorithm 2), the SCPP initialized execution schedule is also adjusted by the list scheduling method. Both algorithms generate feasible schedules by doing the adjustments on the SCPP initialized solution. However, greedy heuristics is by doing the useroriented adjustment, while MEDLS is through the machineoriented adjustment. Intuitively, the two types of adjustment can be combined in the way: first, the user-oriented adjustment by Algorithm is performed by relaxing the resource constraint r to (+γ)r; then, the machine-oriented adjustment by Algorithm 2 is performed. We call the combined method as γ-greedy algorithm. We will evaluate the performance of this algorithm under different parameter of γ in next section. VI. EVALUATION In this section, we evaluate the performance of various algorithms for the RCMCPP. A. Evaluation Setup and Metrics The application is image based object recognition as shown in Fig.. It contains seven modules, therefore we have n = 7. We profile manually the execution time of each module on our laboratory server, and the data size that needs to be transferred between two connective modules. We assume that the processing time of each module on the mobile devices is F times greater than that on the server. Since the users mobile devices have different processing capability, the users have various factor F. In our experiments, the local computation cost is generated by W λ 7 = [F, F 2,..., F λ ] T C, where C is a 7 vector of the profiled execution time of each module on the server and F i yields a uniform distribution in the interval [, 6]. The communication cost is generated by Π λ 8 = [ B, B 2,..., B λ ] T D, where D is a 8 vector of the profiled data size, and B i is the communication bandwidth which also yields a uniform distribution. We have generates a large number of test cases by changing the number of cloud servers r or the number of users (load) λ. Whenever λ is changed, W λ 7 and Π λ 8 need to be re-generated. The comparison of various algorithms are based on the following two metrics: Application Delay Ratio (ADR). The main performance measure of the algorithms is the average application delay that is experienced by the mobile users. Since a large set of tests are performed under different load λ and resources r, it is necessary to normalize the application delay to a lower bound, which is called the Application Delay Ratio (ADR). The ADR value of an algorithm is defined by ADR = application delay d scpp. () The denominator is the application delay under the SCPP solution. In SCPP, the cloud resources is assumed to be unconstrained, and each user s execution schedule is generated independently by SCPP method. The ADR of the RCMCPP algorithms can not be less than one since the dominator is the lower bound. The RCMCPP algorithm that gives the lowest ADR is the best algorithm with respect to performance. Running Time of the Algorithms. The running time of an algorithm is its execution time for outputting the schedule. The metric gives the average cost of the algorithm. For the algorithms which have very close ADR values, the one which minimum running time is the considered as the best one. B. Evaluation of Greedy Heuristics ) Performance Comparison between Greedy Heuristics and LS algorithms: We compare the ADR performance of the greedy heuristics under three different greedy strategies with other two list scheduling algorithms, HEFT and MEDS (Algorithm 3). An concise description about the algorithms is as follows. G-MaxREL. G-MaxREL is the greedy heuristic with the strategy to maximize the Released Cloud Resource Occupied Period. The reward function in the algorithm is Reward = D rel. G-MinED. G-MinED is the greedy heuristic with the strategy to minimize the Extra Delay. The reward function is Reward = D delay. G-MaxRME. G-MaxRME is the greedy heuristic with the strategy to maximize Released Cloud Resource Occupied Period minus Extra delay. The reward function is Reward = D rel D delay. HEFT. HEFT is a well-known list scheduling algorithm specifically for TSPHC problems. We can also use the algorithm to solve our RCMCPP problem. MEDS. MEDS is a list scheduling algorithm which we have proposed to solve the RCMCPP problem. In this algorithm, each module is scheduled to the machine which causes the Minimum Extra Delay. In the first experiment, the number of users is fixed at λ = 2. The ADR-based performance of the algorithms are compared with respect to various number of cloud servers (see Fig.5a). Among the three greedy heuristics, G-MaxRME and G-MinED achieves better performance than G-MaxREL. Compared with HEFT, the performance of our proposed greedy heuristics (G-MaxRME and G-MinED) is better than HEFT for any number of cloud servers. Compared with MEDLS,

11 ADR G MaxRME G MinED G MaxREL HEFT MEDLS ADR G MaxRME G MinED G MaxREL HEFT MEDLS Running of the Algorithm (s).5.5 G MaxRME G MinED G MaxREL HEFT MEDLS No. of cloud servers No. of users No. of cloud servers (a) ADR - r, λ = 2 (b) ADR - λ, r = 2 (c) Algorithm Running Time - r ADR No. of users ADR r=2 r=3 r=4 r=5 r=6 r=7 r=8 No. of users = 2 5 No. of users No. of cloud servers No. of cloud servers γ (d) ADR-r-λ (e) Contour lines of ADR-r-λ figures (f) The effect of γ to the ADR performance Running Time of the Algorithm (s) No. of users = 2 r=2 r=3 r=4 r=5 r=6 r=7 r=8 ADR Greedy γ Greedy HEFT MEDLS Running of the Algorithm (s) Greedy γ Greedy HEFT MEDLS.5.5 γ (g) The effect of γ to the algorithm running time No. of cloud servers (h) ADR comparison of γ-greedy Fig. 5. Evaluation results No. of cloud servers (i) Running time comparison of γ-greedy with other heuristics the greedy heuristics (G-MaxRME and G-MinED) have better performance than MEDLS when the cloud resources is relatively tight (r < 8). When the cloud resources increase to r > 8, the greedy heuristics have the same performance with MEDLS, because in this case the SCPP solution used as the initial solution in both the greedy heuristics and MEDLS becomes feasible for RCMCPP. The average ADR value of the greedy heuristic (G-MaxRME or G-MinED) on all the numbers of cloud servers is better than the HEFT algorithm by percent, and the MEDLS algorithm by percent. For all these five algorithms, the performance increases as the number of the cloud servers increases. Next, we fix the number of cloud servers, r = 2, and compare the ADR performance of the five algorithms when the number of users λ is varying (see Fig.5b). The two proposed greedy heuristics (G-MaxRME and G-MinED) outperforms other algorithms in terms of the overall ADR performance under various number of requesting users. For the greedy heuristics (G-MaxRME and G-MinED) and MEDLS, there exists a threshold, λ = 4, below which when λ increases, the performance is not affected. However, above this threshold, the performance degrades quickly as λ increases. We compare the cost of the algorithms by using the metric of the algorithm running time (see Fig.5c). MIDLS is the most costly one among the five algorithms. For the two greedy heuristics (G-MaxRME and G-MinED) which have the best ADR-based performance, it is shown that G-MaxRME is less costly than G-MinED. Therefore, we conclude that G- MaxRME is the best one among all the five algorithms in terms of the overall performance by ADR and running time metrics. Another interesting thing we can observe from Fig.5.c is that, our proposed greedy heuristics has less running time as the cloud resources increase, while HEFT has longer running time. This is because the greedy heuristics need more iterations to adjust the SCPP initial solution as the cloud resources constraint is lower, such that the occupied cloud resources do

12 not violate the constraint. However, when the cloud resources increase, time complexity of HEFT increases, because more time is spent in machine selection with an insertion-based policy. 2) Performance-Resource-Load (PRL) Model of the Greedy Heuristic: We evaluate how the ADR-based performance changes with the provisioning cloud resources r and load λ on the system by using the proposed greedy heuristic (G- MaxRME). We have measured the ADR value by G-MaxRME algorithm in a large number of test cases, where number of cloud servers r varies in [, 2] and the number of users λ varies in [2, 2]. Fig.5.d and Fig.5.e show an 3-dimension visual PRL model and its corresponding contour lines. Note the contour line contains the (r, λ) points which has the same ADR value. The straight contour lines approximately originating from the point (, ) show that, the ADR performance depends on the ratio of λ and r. The less the λ/r value is, the better the performance is. When λ/r is less than about 2, the performance is not affected by the change of λ/r. C. Evaluation of γ-greedy Heuristics ) Impact of Parameter γ: In this experiment, we evaluate the effect of parameter γ to the performance of γ-greedy algorithm. Fig.5.f and Fig.5.g respectively show the effect of parameter γ onto the algorithm in term of its ADRperformance and running time. It is shown that the optimal value of γ for the ADR performance of the algorithm is typically in [,.5], which depends on the number of provisioned cloud servers (see Fig.5.f). Considering the average ADR performance with respect to various number cloud resources, the optimal γ is.2. In term of the running time (see Fig.5.g), the larger γ is, the less costly the algorithm is. In both figures, it is observed that the more the provisioning cloud resources is, the less impact the parameter γ has to the algorithms. To explain the rationale behind this observation, it is worth noting the fact that γ-greedy is the combination of our proposed greedy heuristic (Algorithm ) and MEDLS (Algorithm 2). There are two special cases for the value of γ. The first case is when γ =, the γ-greedy is actually the same with the greedy heuristics. The other case is when γ becomes large enough, γ-greedy algorithm is the same with MEDLS. It has been demonstrated in Fig.5.a and Fig.5.c that as the cloud resources increases, the difference between the greedy heuristic and MEDLS is reduced in term of both the ADR performance and running time. 2) Performance Comparison: We compare the γ-greedy algorithm with the Greedy heuristic (Algorithm ), HEFT and MIDS. In this evaluation, the MaxRME strategy is applied in both the γ-greedy heuristic and Greedy heuristic. For the γ-greedy algorithm, the parameter γ is assigned with the value.2, because it is demonstrated to be optimal for the ADR performance. Fig.5.h and Fig.5.i are the results for the comparison in terms of ADR performance and running time. The figures shows that, γ-greedy have little improvement in ADR performance over the Greedy heuristic, but it is costly than the Greedy heuristics by 27 percent. VII. CONCLUSION In this paper, we have focused on the Resource Constrained Multi-user Computation Partitioning Problem (RCMCPP). We show that RCMCPP is unique to and difficult than the existing job scheduling problem such as TSPHC and HFS problems. We have formulated the problem into a Mixed Integer Linear Programming (MILP) problem, however, which is not suitable to solve by using linear programming based method because of the exploding number of variables. Thus, we have proposed a set of heuristic algorithms, namely Greedy and γ-greedy, and conducted comprehensive performance evaluations. The proposed algorithms were demonstrated to outperform the list scheduling algorithms, HEFT and MEDLS, by percent. The proposed algorithms are beneficial for the application provider to provide optimal application performance when faced with unpredictable users. The Performance-Resource- Load model obtained by our algorithms can also be applied by the application providers in the design of resource provisioning mechanism, with the aim to minimize their operational cost. REFERENCES [] G. Canepa, and D. Lee. A Virtual Cloud Computing Provider for Mobile Devices. In Proc. ACM MCS, pages ACM Press, 2. [2] E. E. Marinelli. Hyrax: Cloud Computing on Mobile Devices using MapReduce. In Master Thesis, Carnegie Mellon Universtiy, 29. [3] L. Yang, J. Cao, S. Tang, T. Li, and A. Chan. A Framework for Partitioning and Execution of Data Stream Applications in Mobile Cloud Computing. In Proc. of IEEE Fifth International Conference on Cloud Computing. 22 [4] E. Cuervoy, A. Balasubramanianz, and D. Cho. MAUI: Making Smartphones Last Longer with Code Offload. In Proc. of MobiSys. ACM press, 2. [5] B. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti. CloneCloud: Elastic Execution between Mobile Device and Cloud. In Proc. EuroSys. [6] X. Zhang, A. Kunjithapatham, S. Jeong, S. Gibbs. Towards an elastic application model for augmenting the computing capabilities of mobile devices with cloud computing. Mobile Networks and Applications, 6(3), 29. [7] I. Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso. Calling the Cloud: Enabling Mobile Phones as Interfaces to Cloud Applications. In Proceedings of Middleware 9, pages 2. Springer, 29. [8] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies. The Case for VM-Based Cloudlets in Mobile Computing. In IEEE Pervasive Computing, vol. 8, no. 4, pp. 4C23, Oct. 29 [9] K. Kumar, and Y. Lu. Cloud computing for mobile users: Can offloading computation save energy. In IEEE Computer, vol. 43, no. 4, pp. 5C56, Apr. 2 [] Z. Li, C. Wang, and R. Xu. Computation offloading to save energy on handheld devices: a partition scheme. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages -6. IEEE Press, 2. [] M. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govindan. Odessa: enabling interactive perception applications on mobile devices. In Proc. of MobiSys. ACM press. 2 [2] D.G. Lowe. Distinctive image features from scale-invariant keypoints. In International Journal of Computer Vision, vol.6, no.2, pp.9-, 24 [3] U. Sharma, P. shenoy, S. Sahu, and A. Shaikh. A cost-aware elasticity provisioning system for the cloud. In Proc. of ICDCS, pp , 2 [4] H. Topcuoglu, S. Hariri, and M. Wu. Performance-effective and lowcomplexity task scheduling for heterogeneous computing. In IEEE Transactions on Parallel and Distributed Systems, vol.3, no.3, March, 22 [5] L. Hall, D. Shmoys, and J. Wein. Scheduling to minimize average completion time: off-line and on-line algorithms. In Proc. of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp.42-5, January, 996

13 [6] M. Pinedo. Scheduling Theory, Algorithms, and Systems, 2nd ed. Prentice Hall, Upper Saddle River, New Jersey 7458, 22 [7] E.L.Lawler, and D.E.Wood. Branch-and-Bound Methods: A Survey. In Operations Research,July, 966, 4:699-79