Optimization for QoS on Web-Service-Based Systems with Tasks Deadlines 1 Luís Fernando Orleans Departamento de Engenharia Informática Universidade de Coimbra Coimbra, Portugal lorleans@dei.uc.pt Pedro Nuno Furtado Departamento de Engenharia Informática Universidade de Coimbra Coimbra, Portugal pnf@dei.uc.pt Abstract Web-Services-based parallel systems (WSB-P) should exhibit predictable behavior by guaranteeing Quality-of-Service parameters. One of the important requirements of predictability is that requests (tasks) should have maximum acceptable response time thresholds, which we denote as deadlines in this paper. In order to provide QoS, a WSB-P should try to guarantee these deadlines. Such policy affects the way the system should be designed, because all task durations must be under the deadline, and also how tasks are distributed among the servers. This way, it is desirable to use a load balancing algorithm that tries to both maximize the throughput and minimize the missed deadlines. This paper shows, with the help of a simulator, how the number of concurrent executions (CE) in the WSB-P plays a crucial role in the pursuit of the objective and how the optimal CE can be determined. load-balancing; deadlines; QoS; simulation; web-services I. INTRODUCTION Web-Services-based parallel systems (WSB-P) should exhibit predictable behavior by guaranteeing Quality-of- Service parameters. One of the important requirements of predictability is that requests (tasks) should have maximum acceptable response time thresholds, which we denote as deadlines in this paper. In order to provide QoS, a WSB-P should try to guarantee these deadlines. Such policy affects the way the system should be designed, because all task durations must be under the deadline, and also how tasks are distributed among the servers. This way, it is desirable to use a load balancing algorithm that tries to both maximize the throughput and minimize the missed deadlines. This paper shows, with the help of a simulator, how the number of concurrent executions (CE) in the WSB-P plays a crucial role in the pursuit of the objective and how the optimal CE can be determined. Both traditional LWR and Round-Robin (RR) provide no control over the concurrent executors (CE) number on the system, which means that every request that arrives is immediately put into execution, no matter how many requests are already running on the servers. This policy is known as best-effort, i.e., for all requests that arrive, the system will do its best to handle them, though providing no response time guarantees. Using a simple linear analysis, if a request would take 1 second to run alone, in a server that already has 1 requests running it will take 1 seconds to complete. In a 1. This work was supported by the Adapt-DB project FCT POSC/EIA/5515/24. system where tasks have deadlines, this behaviour is unacceptable, as the number of missed-deadlines highly depends on the CE number. To reduce the number of missed-deadlines in WSB-P, one can add new servers to the cluster and upgrades on all servers, improving the computing capacity of the system as a whole. These are modifications of the installed capacity. They do not happen instantaneously, may lead to expensive over-capacity options to deal with peaks.5% of the time and on themselves do not provide the answer to meeting the deadlines with heavy traffic. No matter what the installed capacity is, the problem occurs whenever the rate of requests arrivals is sufficiently large (traffic peaks) to mean that each node will have to process several requests simultaneously. A logical solution to reduce the number of misseddeadlines is to adapt the load-balancing algorithms, making them to limit the CE number of the system. This way, there will be only an adaptable number of concurrent requests interfering on each others execution. The requests that arrive while the system is saturated, i.e., the maximum CE value has been reached, give rise to a system busy notification. The actual CE depends on the deadline tightness (how tight is the deadline). This paper describes a model where requests may be accepted or rejected by the system at the moment of their arrival, depending on the current load. This mechanism of acceptance or rejection is called admission control mechanism. Our objective is to design approaches that are completely external to the controlled system. Since a request is accepted, it has to be fully executed before its deadline. Otherwise, its execution will be cancelled (or killed). It is supposed that all requests are not preemptible. This way, once a task is assigned to a server, it will run on the same server until its completion. Finally, all tasks are independent, as their executions do not block each other. This work is about load-balancing in QoS systems, so the best algorithm will be the one which combines good throughput and low number of killed tasks. II.RELATED WORK There are plenty of works about load-balancing and QoS, most of them leading to already accepted and consolidated
conclusions. Although these are almost exhausted themes, their combination seems to be an area where there are very few research results. A. Load-Balancing Many load-balancing algorithms have been studied, where most of them try to maximize throughput or minimize the mean response time of requests. Reference [1] proposes an algorithm called TAGS, which is supposed to be the best choice when task sizes are unknown and they all follow a heavy-tailed distribution. This is not the case for the scenario analyzed in this paper, because task sizes must be below a deadline threshold. It is also shown in [1] that, when task sizes are not heavy-tailed, Least Work Remaining has a higher throughput then TAGS. In fact, [2] and [3] claim that Least- Work Remaining is optimal when task sizes are exponential and unknown. The algorithm SITA-E [4] has the best performance when task sizes are known and are heavy-tailed. In cases where task sizes are not heavy-tailed, Least-Work-Remaining presents a better throughput. B. Quality of Service (QoS) In real distributed systems, task sizes are heavy-tailed. This means that a very small portion of all tasks are responsible for half of the load [5]. In models with deadlines, like the one analyzed in this paper, occurs a similar distribution. Most of all tasks are very small ones. And there is a small number of big tasks, but these all are under the deadline limit, i.e., they're maximum duration is the same as the deadline's. Reference [6] presents a model where the number of concurrent requests within the system is restricted. When this number is reached, the subsequent requests are enqueued. But this model has no concerns about deadlines or rejection of requests. It also does not show a way to load-balance the arriving tasks, since it is a single-server architecture. Quality-of-Service was also studied for Web Servers. In [15] the authors propose session-based Admission Control (SBAC), noting that longer sessions may result in purchases and therefore should not be discriminated in overloaded conditions. They propose self-tunable admission control based on hybrid or predictive strategies. [14] uses a rather complex analytical model to perform admission control. There are also approaches proposing some kind of service differentiation: [12] proposes architecture for Web servers with differentiated services; [13] defines two priority classes for requests, with corresponding queues and admission control over those queues. [17] proposes an approach for Web Servers to adapt automatically to changing workload characteristics and [16] proposes a strategy that improves the service to requests using statistical characterization of those requests and services. Comparing to our own work, all the works referenced above except [16] are either intrusive, require extensive modifications to systems, or use analytical models that may not provide guarantees against real system. The work in [16] does not consider deadlines and does not adapt constantly as ours does, instead it fixes the maximum throughput capacity when it runs the test runs. III.LOAD BALANCING ALTERNATIVES Load balancing is a fundamental basic building block for construction of scalable systems with multiple processing elements. There are several proposed load-balancing algorithms, but on of the most common in practice is also one of the simplest ones - Round-Robin (RR). This algorithm produces a kind of static load-balancing functionality, as the tasks are distributed round-the-table with no further considerations. We consider not only RR as also Least-Work- Remaining (LWR), as this algorithm is supposed to be the best choice when task sizes are not heavy-tailed. Our proposal is for adapted versions of both RR and LWR, including as a parameter the CE value. We are concerned with guaranteeing specified acceptable response time limits, which we denote as deadlines. The number of concurrent executions is a crucial variable when deadlines are involved, because as we increase the number of concurrent executions we have a larger probability of missing the deadlines. For this reason, the algorithms have a control over the maximum number of concurrent executors (CE) within the system, which varies from 1 to. In the following we describe each load balancing algorithm we compare considering deadlines and rejection: Round-Robin (RR) for each task that arrives: if SUM(tasks) <= EC next_server := server_list ->first send (task, next_server) server_list -> move_to_end(next_server) else NOT_ADMITTED(task) Least-Work-Remaining (LWR) for each task that arrives: if SUM(tasks) <= EC next_server := server_list ->least_utilized send (task, next_server) else NOT_ADMITTED(task) IV.EXPERIMENTAL ANALYSIS Due to the large number of parameters involved, it becomes necessary to formally describe the simulation model that was used in this work. The simulator, constructed in University of Coimbra, has the following parameters : Number of servers. Processing capacity of each server. Tasks arrival rate. Mean task size.
Number of maximum tasks being processed concurrently. Maximum amount of time a task can execute (deadline). Load-balancing algorithm. In this paper, a system with 3 identical servers was simulated. The deadline time was set to 1 seconds and the duration of each request follows an exponential distribution with mean 2 (we generated tasks with durations between.1 and 1 seconds). In addition, all tasks have the same priority. The concurrency model is linear, which means that a task will take twice longer if it shares the server with another task, it will take three times longer in case of two other tasks and so forth. The simulator implements the two algorithms: Least- Work-Remaining (LWR), Round-Robin (RR). We will compare the performance of these and all results are presented as an arithmetic mean of 3 rounds of simulation. To eliminate the transient phase, the data obtained in the first 1 minutes of the simulation was discarded. Only the results obtained in the next 3 minutes were considered. Figure 1 shows the simulated architecture. 1 8 2 Figure 2. 1 8 2 RR - Best-Effort (%) RR performance when submitted to a Best-Effort load-balancing mechanism LWR - Best-Effort (%) A. Simulation results This section contains the results obtained with the simulator. Figures 2 and 3 shows how the system performs when the chosen algorithms are LWR and RR, when there is no CE control, i.e., in a best-effort way. The first thing to be noticed is that the performances are very similar for all alternatives, due to the low variability of task sizes provided by the exponential distribution. Since the system is under high load, with 1 tasks arriving per second, the number of killed tasks increases as the number of the CE increases. In both algorithms the throughput is low: not even 2% of the submitted tasks get fully executed. On the other hand, figures 4 and 5 shows how the system performs when the admission control mechanism is enabled. Although the throughput did not increased, the number of killed tasks got significantly lower, since there is a large number of rejected tasks. In other words, the system is trying to admit only the requests it can execute. Figure 3 LWR performances when submitted to a Best-Effort loadbalancing mechanism Another interesting point to notice on the graphics displayed in figures 4 and 5 is that the maximum throughput is reached with very low CE values. In fact, for LWR, it is reached with 4 CE, while for RR it is needed 6 CE for obtaining the best performance. At this values, the percentage of killed tasks is very low, (1,94% for RR and,42% for LWR), which means that, since a task is admitted by the system, it has a huge probability to be completely executed. But how do these algorithms behave if tasks durations are not exponential? In fact, [6] claims that tasks durations in a distributed/parallel system are not exponential, but they follow a heavy-tailed distribution, such as a Pareto distribution. 1 Round-Robin 9 8 7 5 3 2 1 Figure 1. Simulated architecture. Figure 4 Round-Robin with admission control mechanism
1 Least-Work Remaining 1 Least-Work Remaining 9 9 8 8 7 5 7 5 3 3 2 2 1 1 Figure 5 Least-Working Remaining with admission control mechanism This means that a minuscule fraction of the very largest jobs comprise half of the total system load. This distribution is not applied in the scenario being studied in this paper, because we limit the task duration to a value below 1 seconds (variability is limited). But what would happen if the number of small tasks is much greater than the number of big tasks? Figures 6 and 7 show how the algorithms would perform in such a situation, where tasks durations follow a bounded- Pareto [4] distribution, with the exponent of the power law α =.6, the smallest possible observation, k =.1 and the largest possible observation, p = 8. As can be noticed, the performance of LWR and RR are still close. Since the number of small tasks is much, much greater than the number of big tasks, the throughput is better than the exponential case, where there can be clearly identified that the main reason for the poor performance of the exponential model is the existence of too many big tasks - not present in a Pareto distribution. In the Pareto-like models, a good approach to determine the best CE value is measuring the difference between ended tasks and killed tasks and. Using this trick, the best CE value in RR is 24 (the difference is 31,5: 43,13% ended tasks minus 11,63% killed tasks) and the best CE value in LWR is 25 when their difference is 32,33 (44,42% ended tasks minus 12,9% killed tasks). 1 9 8 7 5 3 2 1 Round-Robin Figure 6. Round-Robin performance in a system with an admission control mechanism and tasks durations follow a Pareto distribution Figure 7 Least-Work-Remaining performance in a system with an admission control mechanism and tasks durations follow a Pareto distribution V.CONCLUSIONS AND FUTURE WORK Distributing the load on parallel systems following QoS adaptability principles is a new, challenging task. Most of the existing algorithms try to equally load-balance the requests, but offer no guarantees about the response time, handling all requests in a best-effort way. Since tasks may have maximum acceptable response times (deadlines), it turns out that response time is a crucial concern. In this paper we showed how the well known loadbalancing algorithms Least-Work-Remaining and Round- Robin poorly perform when tasks have deadlines. Most of the submitted tasks are killed, and a very few of them get completely executed. This happens because the interference caused by the concurrent executions (CE) within the same server and was showed that these CE number do not need to be high to reach the maximum throughput. So, the best CE value is the smallest value where the maximum throughput is reached. These are also CE values where the number of killed tasks is very low. This paper also showed how these well-known algorithms behave in a more realistic scenario, where the number of short-duration tasks is much greater than the long-duration tasks. In such models, the number of killed tasks is always below the number of ended tasks, and the optimal CE is the point where the difference between these two values are maximal. As future works, we expect to study an efficient way to estimate the duration of requests. This could be added to an implementation of an algorithm in order to identify and classify the tasks according with their sizes. This way, the load-balancing algorithm could try to avoid the short tasks from being blocked by long tasks. REFERENCES [1] Harchol-Balter, M.: Task assignment with unknown duration.. Journal of the ACM, 22, p.49 [2]. Nelson, R., Philips, T.: An approximation to the response time for shortest queue routing. Performance Evaluation Review, 1989, p.181-189. [3]. Nelson, R., Philips, T.: An approximation for the mean response time for shortest queue routing with general interarrival and service times.
Performance Evaluation, 1993, pp. 123-139. [4]. Harchol-Balter, M., Crovella, M., Murta, C.:On choosing a task assignment policy for a distributed server system. Journal of Parallel and Distributed Computing,1999, v.59 n.2, 24-228. [5] Harchol-Balter, M., Downey, A.: Exploiting process lifetime distributions for dynamic load-balancing. ACM Transactions on Computer Systems, 1997. [6] Schroeder, B., Harchol-Balter, M.: Achieving class-based QoS for transactional workloads. ICDE, 26, p.. [7] Crovella, M., Bestavros, A.Self-similarity in World Wide Web traffic: Evidence and possible causes IEEE/ACM Transactions on Networking, 1997, pp. 835-836 [8] Knightly, E., Shroff, N.: Admission Control for Statistical QoS: Theory and Practice. IEEE Network, 1999, vol. 13, no. 2, pp. 2-29. [9] Barker, K., Chernikov, A., Chrisochoides, N. Pingali, K.: A Load Balancing Framework for Adaptive and Asynchronous Applications. IEEE Transactions on Parallel and Distributed Systems, 24, Vol. 15, No. 2 [1] Serra, A., Gaïti, D., Barroso, G., Boudy, J.: Assuring QoS Differentiation and Load-Balancing on Web Servers Clusters. IEEE Conference on Control Applications, 25, pp. 8.85-89 [11] Cardellini, V. Casalicchio, Colajanni, M., Yu, P.S.: The State of the Art in Locally Distributed Web-Server Systems. ACM Computing Surveys, 22, Vol. 34. 263-311. [12] Bhatti N. and R. Friedrich. Web server support for tiered services. IEEE Network, 13(5):64 71, September 1999. [13] Bhoj, Rmanathan, and Singhal. Web2K: Bringing QoS to Web servers. Tech. Rep. HPL-2-61, HP Labs, May 2. [14] Chen X., P. Mohapatra, and H. Chen. An admission control scheme for predictable server response time forweb accesses. In Proceedings of the 1th World Wide Web Conference, Hong Kong, May 21. [15] Cherkasova and Phaal. Session-based admission control: A mechanism for peak load management of commercial Web sites. IEEE Req. on Computers, 51(6), Jun 22. [16] Elnikety S., Nahum E., Tracey J.; Zwaenepoel W., A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites, WWW24: The Thirteenth International World Wide Web Conference, New York City, NY, USA, May 24. [17] Pradhan P., R. Tewari, S. Sahu, A. Chandra, and P. Shenoy. An observation-based approach towards self managing Web servers. In International Workshop on Quality of Service, Miami Beach, FL, May 22.