Static and Dynamic Scheduling Algorithms for Scalable Web Server Farm

Static and Dynamic Scheduling Algorithms for Scalable Web Server Farm Emiliano Casalicchio University of Roma Tor Vergata Roma, Italy, 133 ecasalicchio@ing.uniroma2.it Salvatore Tucci University of Roma Tor Vergata Roma, Italy, 133 tucci@uniroma2.it Abstract Multiprocessor-based servers are often used for building popular Web sites which have to guarantee an acceptable Quality of Web Service. In common multi-node systems, namely Web server farms, a Web switch (say, Dispatcher) routes client requests among the server nodes. This architecture resembles a traditional cluster in which a global scheduler dispatches parallel applications among the server nodes. The main difference is that the load reaching Web server farms tends to occur in waves with intervals of heavy peaks. These heavy-tailed characteristics have motivated the use of policies based on dynamic state information for global scheduling in Web server farms. This paper presents an accurate comparison between static and dynamic policies for different classes of Web sites. The goal is to identify main features of architectures and load management algorithms that guarantee scalable Web services. We verify that a Web farm with a Dispatcher with full control on client connections is a very robust architecture. Indeed, we demonstrate that if the Web site provides only HTML pages or simple database searches, the Dispatcher does not need to use sophisticated scheduling algorithms even if the load occurs in heavy bursts. Dynamic scheduling policies appears to be necessary for scalability only when most requests are for Web services of three or more orders of magnitude higher than providing HTML pages with some embedded objects. 1. Introduction Most popular Web sites are suffering from severe congestion, since they are getting millions of requests per day in coincidence or not with special events. These sites are overwhelmed by the offered load and the Web service providers have to deal with peak demands that are much higher than the average load supported by their site. Any single machine can easily become a bottleneck and, even worse, that server would represent an intolerable single point of failure. The most obvious way to cope with growing service demand is adding hardware resources because replacing an existing machine with a faster model provides only temporary relief from server overload. Furthermore, the number of requests per second that a single server machine can handle is limited and cannot scale up with the demand. The need to optimize the performance of Internet services is producing a variety of novel Web architectures. In this paper we consider Web server farms that use a tightly coupled distributed architecture or a multiprocessor machine. From the user s point of view, any HTTP request for a document is presented to a logical (front-end) server that acts as a representative for the Web site. This server, namely Dispatcher, retains transparency of the parallel architecture for the user, guarantees backward compatibility with present Web standards, and distributes all client requests to the back-end servers. Multiprocessor or cluster architectures with Dispatcher have been adopted with different solutions in various academic and commercial Web farms, e.g. [1, 5, 8, 1, 12]. One of main goals for scalability of a parallel/distributed system is the availability of a mechanism that optimally balances the load over the server nodes. Numerous scheduling algorithms were proposed for multi-node architectures executing parallel or distributed applications. We want to investigate traditional and new algorithms that allow scalability of Web server farms receiving peak demands. Traditional scheduling policies have been analyzed mainly under the hypothesis of Poissonian task arrivals and exponential service times, while the independence of request arrivals to a Web site has been clearly demonstrated to be not valid [9, 11]. Internet traffic tends to occur in waves with intervals of heavy peaks, moreover the service time of HTTP requests may have large variance. As a consequence, the Web workload is represented typically through heavy tail distribution functions for both inter-arrival and service times. These load characteristics and similarities 1

with the traditional problem of scheduling applications on multi-node architectures represent the main motivations for using even in Web server farms global scheduling strategies based on dynamic state information instead of informationless static algorithms. However, dynamic policies require mechanisms for monitoring and evaluating the current load on each server, gathering the results, combining them, and taking real-time decisions. This paper wants to investigate when the Dispatcher needs dynamic policies for achieving scalability of high performance Web server farms because no accurate comparative study exist between static and dynamic policies. Under workload characteristics that resemble those experienced by real Web servers, we observed that bursts of arrivals and skewed service times alone do not motivate the use of sophisticated global scheduling algorithms. Instead, the most important feature to be considered for the choice of the dispatching algorithm is the kind of services provided by the Web sites. If the Dispatcher mechanism has full control on client requests and most client require HTML pages or submit light queries to a database, the system scalability is achieved even without sophisticated scheduling algorithms. Indeed, in these instances, simple static policies are as effective as their more complex dynamic counterparts. Scheduling based on dynamic state information appears to be necessary only in the sites where the majority of client requests are of three or more orders of magnitude higher than providing a static HTML page with some embedded objects. The Web farm multiprocessor architecture is so robust that the global scheduling algorithm has an impact much less significant than that it has in other multi-node Web sites, for example geographically distributed Web sites or other distributed Web systems where the Dispatcher role is taken by system components (e.g., DNS, single Web servers) that control only a limited percentage of requests reaching the Web site [7]. The remaining part of the paper is organized as follows. In section 2, we describe the architecture of a Web server farm and select some feasible policies for the Dispatcher. In section 3, we present an accurate model for Web server farms and the parameters of the workload model. In section 4, we discuss the experimental results for various classes of Web sites. 2. Web server farms and scheduling algorithms A Web server farm refers to a Web site that uses two or more servers housed together in a single location to handle user requests. Although large Web farms may consist of dozens of servers, they uses one hostname site to provide a single interface for users. Moreover, to have a mechanism that controls the totality of the requests reaching the site and to mask the service distribution among multiple back-end servers, Web server farms provide a single virtual IP address that corresponds to the address of front-end server(s). Independently of the actual system mechanism that existing Web farms use to assigns the load, in this paper this entity is called Dispatcher. The Domain Name Server(s) for the Web site translates the URL name into the IP address of the Dispatcher. In such a way, the Dispatcher acts as a centralized global scheduler that receives the totality of the requests and routes them among the back-end servers of the Web farm. To allocate the load among the Web servers, the Dispatcher is able to uniquely identify each server machine in the Web farm through a private address. We consider a Web farm consisting of homogeneous distributed servers that provide the same set of documents. Indeed, most Web server farms so far proposed assume that each server is able to respond to each request for any part of the provided service. The details about the operations of the Web server farm are described in section 3. Various academic and commercial products confirm the increasing interest in these multi-node architectures. A valuable recent survey can be found in [14]. In this paper, we consider a Dispatcher working at layer 4 switching, with layer 2 packet forwarding. Such a Dispatcher cannot use highly sophisticated algorithms because it has to take fast decision for hundreds of requests per second. Static algorithms are the fastest solution because they do not rely on the current state of the system at the time of decision making. For the same reason of information-less, static algorithms can potentially make poor assignment decisions, such as routing a request to a server node having a long queue of waiting load while there are other almost idle nodes. Dynamic algorithms have the potential to outperform static algorithms by using some state information to help dispatching decisions. On the other hand, they require mechanisms that collect, transmit and analyze state information thereby incurring in overheads. We consider three scheduling policies that can be carried out by the Dispatcher: Random (), Round-Robin () and Weighted Round-Robin (W), which is actually a class of dynamic algorithms. We do not consider more sophisticated algorithms to prevent the Dispatcher to become the primary bottleneck of the Web server farm. Actually, in all experiments the Dispatcher that forwards packet requests to servers without header rewriting turned out to be remarkably fast and scalable. Random and are truly static policies. Weighted Round-Robin is a class of dynamic policies [12] that uses more or less precise information about the system state. For each server Ë, W associates a dynamically evaluated weight Û ¼ which is proportional to the current load state of Ë. The server weight is computed as Û ½ Ð Ð Ñ Ü µ, where Ð is the current load state of the server Ë, and Ð Ñ Ü is the maximum current load

among all servers. The load states Ð are computed periodically by each server through some load indexes. Typical server load measures are the number of active processes on server, mean disk response time, and hit response time that is, the mean time spent by each request at the server. Every Ì Ø seconds, the Dispatcher gathers load information from servers and computes periodically the weights Û. The weight of the server Ë is equal to zero (Û ¼) when Ð Ð Ñ Ü that is, the server has a very high load and should not receive new requests. When the server has no requests to serve, Û reaches the maximum value Û ½. In all other instances, it holds ¼ Û ½. In this paper, we focus on dynamic policies that use as load indexes the mean number of active processes at each server (W num policy) and the mean hit response time (W time policy). 3. System and workload model To analyze and compare the dispatching policies presented in the above section, we designed and implemented a detailed simulation model of the Web server farm. The architecture of the system is shown in Figure 1: the Web farm consists of Æ back-end servers and a dedicated machine that acts as the Dispatcher. The primary DNS translates the site hostname into the IP address of the Dispatcher. The addresses of the back-end servers are private and invisible to the extern. Back-end servers and Dispatcher are connected through a local fast Ethernet with 1 Mbps bandwidth. As the focus is on Web server farm performance we did not model the details of the external network. To prevent the bridge(s) to the external network to become a potential bottleneck for the Web farm throughput, we assume that the system is connected to the Internet through one or more large bandwidth links that do not use the same Dispatcher connection to Internet [12]. Each back-end server in the farm is modeled as a separate process. Each server has its CPU, central memory, hard disk and network interface. We use real parameters to setup the system. For example, we parameterize the disk with the values of a real fast disk (IBM Deskstar34GXP) having transfer rate equal to 2 MBps, controller delay to.5 msec., seek time to 9msec., and RPM to 72. The main memory transfer rate is set to 1MBps. Network interface is a 1Mbps Ethernet card. In some experiments, a portion of the main memory space of each server is used for Web caching. The cache may contain up till 2% of the total size of documents of a Web site. The Web server software is modeled as an Apache-like server, where an HTTP daemon waits for requests of client connections. As required by the HTTP/1.1 protocol, each Web page request forks a new HTTP process that serves the HTML file and all embedded objects. The client-server Wide Area Network Primary DNS (URL -> IP) Dispatcher Server 1 Local Area Server 6 Network Server 2 Server 3 Server 4 Server 5 Figure 1. Web farm architecture. interactions are modeled at the details of TCP connections including packets and ACK signals. Each client is a realized as a process that, after activation, enters the system and generates the first request of connection to the Dispatcher of the Web farm. The entire period of connection to the site, namely Web session, consists of one or more page requests to the site get through the HTTP/1.1 protocol. At each request, the Dispatcher applies some routing algorithm and assigns each connection to a server. The page request is for an HTML page that contains some embedded objects and may include some computation or database search. The client will submit a new page request only after he has received the complete answer that is, the HTML page and all embedded objects. Moreover, we include a user think time that models the time required to analyze the requested page and decide (if necessary) for a new request. The granularity of the Dispatcher operating at layer 4 switching is at the page request level. This means that the selected server has to provide all files and services (i.e., computation, DB query) contained in a request. A Web server farm may host different types of Web services. Most sites are characterized by having many short services and a few long ones. However, salient characteristics of the Web request load may span even three/four orders of magnitude. The basic service is to serve a static HTML page with some embedded objects. In our model, the time required to process a similar request stays about the order of milliseconds. Let ËÌ (Basic Service Time order) denote the order of magnitude for this basic service. Starting from this basic measure of service, in section 4 we consider the following time scales for more intensive Web services: ½¼ ËÌ : Typically for requests of long data streams, such as multimedia or software files. ½¼¼ ËÌ : Typically for requests that involve light-medium queries to a database.

Parameter Value Number of servers 2-32 (1) Disk transfer rate 2 MBps Memory transfer rate 1 MBps HTTP protocol 1.1 Intra-servers bandw. 1 Mbps Arrival rate Requests per session 1-56 (7) clients per second Inverse Gaussian (, ) User think time Pareto («½, ¾) Embedded objects Hit size - body Hit size - tail Pareto («½ ½ ½ ½ µ, ½) Lognormal ( ¼, ½ ¼ ) Pareto («½, ¾ ¾ ) Table 1. Parameters of the system and workload model. ½¼¼¼ ËÌ : Typically for requests that involve some intensive computation and/or complex search in one or multiple databases. This classification is done for the purposes of our experiments only. Although realistic, it does not want to be a precise taxonomy for all Web services as in [13]. Indeed, some requests for multimedia files could be classified as ½¼¼ ËÌ, or some database queries as ½¼ ËÌ. In the experiments we distinguish two classes of scenarios: Web publishing sites with static content and Web sites with dynamic content. The workload model incorporates all most recent results on the characteristics of real Web load. The high variability and self-similar nature of Web access load is modeled through heavy tail distributions such as Pareto and Lognormal distributions [2, 3, 4, 9]. The number of requests per client session that is, the number of consecutive Web requests a user will submit to the Web site, is modeled according to the inverse Gaussian distribution. The time between the retrieval of two successive Web pages from the same client, namely the user think time, is modeled through a Pareto distribution [4]. The number of embedded objects per page request including the base HTML page is also obtained from a Pareto distribution [4]. The distribution of the file sizes requested to a Web server is a hybrid function, where the body is modeled according to a lognormal distribution, and the tail according to a heavy-tailed Pareto distribution [3]. A summary of the distributions and parameters used in our simulation experiments is in Table 1. 4. Performance results To analyze performance of global scheduling algorithms, we use a metrics that is derived by the Load Balance Metric (LBM) proposed in [6]. LBM is the weighted average value of the peak-to-mean ratio that is defined as Ô ÐÓ ÈÒ ½ ÐÓ µ Ò, where Ô ÐÓ is the peak load at the Ø sampling periods and ÐÓ is the load of server at the same time. The definition of LBM for a system with Ò servers and Ñ sampling periods is: Ä Å È Ñ ½ È Ò Ô ÐÓ ÈÒ ½ ÐÓ µ Ò ½ ÐÓ Ò µ È Ñ È Ò ½ ½ ÐÓ Ò (1) This metrics shows the ability of the scheduling policies to share the load through the servers of the Web farm. The load index may be the number of requests being served concurrently (LBM-Hit) or the time to serve 1 KByte request (LBM-Byte). In our experimental results, these load measures are sampled at each server every 1 seconds. Instead of the pure LBM-Byte, we prefer to plot the (UF) that is, the percentage difference between the LBM value and the best LBM value which is achieved when the load is perfectly balanced among the servers. By definition, LBM ranges from 1 and the number of servers Ò, so we have Í Ä Å ½ (2) Ò This percentage index gives a more immediate measure to compare performance results of different scheduling policies. In some experiment, to measure of the impact of global scheduling policies on the system performance, we use the peak throughput that is, the maximum Web system throughput measured in MBytes per second (MBps). In the following sections we analyze the behavior of static and dynamic routing strategies by considering two main classes of services for a Web site: (1) Web farms providing HTML pages with embedded objects and large files such as multimedia and software files; (2) Web farms providing even dynamic pages that may require intensive CPU and/or database operations. 4.1. Web farms with static contents In the first set of experiments we evaluate the impact of dispatching policies when the Web site provides only static (even if potentially large) objects. We analyze the system sensitivity as a function of various parameters that is, client arrival rate, number of servers and disk cache size. To compare performance results we maintain equivalent the ratio between offered load and Web farm capacity that is, for higher number of servers we augment proportionally the arrival rate. Figure 2 shows the unbalance factor when the client arrival rate varies from 1 to 9 clients per second. For example, an arrival rate of ¼¼ clients per second corresponds ½

to an arrival rate of ¼ ¼ page requests per second. (It must be considered that clients do not send requests during user think time and Web service time.) From this figure we can observe that all global scheduling policies achieve analogous results. With a low arrival rate, the unbalance factor is negligible. With an arrival rate of 7-9 clients per second, the system is highly loaded. Hence, the queue lengths augment rapidly and the unbalance factor increases as well. However, this happens for all policies without significant differences. In all these instances, system scalability can be achieved only through an increment of Web server nodes..3.25.2.15.1 W_time W_num for higher numbers of servers the performance of a static information-less policy such as is quite comparable to both W state-aware algorithms. Other not reported experiments for different system parameters confirm these conclusions. Our motivation for this unexpected result is that heavy tail service time distributions and bursts of client arrivals are well counterbalanced by the Dispatcher full control on request arrivals. We carried out other sensitivity experiments as a function of the disk cache size. In Figure 4 we show the peak throughput of the Web farm as a function of the document cache dimension, measured as percentage of total size of site s documents set. We see that when the cache hit rate passes a low threshold (e.g., 1-15%), sophisticated mechanisms for controlling dynamic system states are really unnecessary. So a simple round robin global scheduler that distributes the load circularly among the server nodes is the best compromise between efficacy and cost for achieving scalable Web publishing sites with static content. 6.5 5 1 2 3 4 5 6 7 8 9 Clients per second Figure 2. Sensitivity to the client arrivals per second. Peak throughput (MBps) 4 3 2 1 W_time W_num.3.25.2.15.1.5 W_time W_num 2 4 8 16 32 Number of servers Figure 3. Sensitivity to the number of Web server nodes. Varying the number of Web servers does not change the performance result order of the scheduling policies. Figure 3 shows that the unbalance factor increases with the number of servers, with a curve skew minor than 1. Even 3 5 1 15 2 Cache Size (% of total document size) Figure 4. Sensitivity to the disk cache dimension. A global increase of the service time scale per request, and a mixed workload, should result in a major stress for the scheduling policy. Hence, setting a dynamic dispatching policy could improve Web farm performance. On the other hand, augmenting dynamism of the system environment makes much harder to tune the best parameters of the state-aware policies. To investigate how this trade-off affects scalability of web farms is the subject of the following section. 4.2. Web farms with dynamic contents Let us now consider Web farms that provide mainly dynamic information. Now, the server response must be personalized for each user, and we suppose that each answer is obtained through more or less intensive computation and/or

database operations. Focusing on this class of Web systems has two main consequences: we remove the effects of Web caching because each information is useful to one client; we increase the service time scale of two or more orders of magnitude with respect to the Basic Service Time order ( ËÌ ) defined in section 2. In our experiments we consider that the Web farm receives a mixed workload where requests could be static (BST), lightly dynamic (½¼ ½¼¼ BST) and heavily dynamic (½¼¼ ½¼¼¼ BST). In particular, we consider two representative scenarios: Scenario A: it consists of 5% of static requests, 25% of lightly dynamic requests and 25% of heavily dynamic requests. Scenario B: it consists of 3% of static requests, 35% of lightly dynamic requests and 35% of heavily dynamic requests. The first set of results aims to demonstrate the convenience of using a dynamic W policy over a static, if the parameters of the dynamic policies are tuned at their best. The main factors that affect W performance are the load indexes (mean response time required for the W time policy, and mean number of active processes at each server for the W num policy), and the interval of state information gathering by the Dispatcher that is, the Ì Ø value defined in section 2. Figure 5 and 6 show the sensitivity of W num with respect to the Ì Ø periods for scenario A and B. W time is not reported because it achieves similar results. 1.8.6.4.2 W_num(1-1-1) W_num(1-1-1) W_num(1-1-1) (1-1-1) (1-1-1) 1 1 1 1 Tget (sec.) Figure 5. Sensitivity of W num to the Tget parameter for the scenario A The performance of W algorithms is very sensitive to Ì Ø. The best values depend on the time scale of the service time and on workload composition. Figure 5 and 6 show that a bad choice for the Ì Ø value could result in a 1.8.6.4.2 W_num(1-1-1) W_num(1-1-1) W_num(1-1-1) (1-1-1) (1-1-1) 1 1 1 1 Tget (sec.) Figure 6. Sensitivity of W num to the Tget parameter for the scenario B 5% decrease of performance (even worse than ) when lightly dynamic requests are 1 BST and heavily dynamic requests are 1 BST. In the remaining part of the paper, we suppose that we are able to choose always the best Ì Ø value for the W policies. We will refer to these algorithms as W time- Best and W num-best. In particular in the last set of experiments, we compare W-Best policies with and under scenario A and B with three combinations of dynamic workload: 1-1-1: that is 5(3)%, 25(35)%, 25(35)% of BST, 1BST, 1BST requests for scenario A(B); 1-1-1: that is 5(3)%, 25(35)%, 25(35)% of BST, 1BST, 1BST requests for scenario A(B); 1-1-1: that is 5(3)%, 25(35)%, 25(35)% of BST, 1BST, 1BST requests for scenario A(B). Figures 7 and 8 compare performance, and W-Best strategies when the system is stressed by the workload combination previously mentioned. The first observation is that performance of each policy depend much on workload. In scenario A (figure 7) when we pass from the workload 1-1-1 to the other two combinations, the unbalance factor decreases. This result is motivated by the intensive use of Web server resources. Both queue length and response time increase so that the mutual unbalance is less evident in percentage. Moreover, we can see that the mean number of active process is always a good index for server load. In scenario B (figure 8), when there is a 7% percent of dynamic requests, it is impossible to define a stable trend for dispatching strategies behavior. For example, in the case 1-1-1 and 1-1-1, W time is the best policies. For

.6.5.4.3.2.1 W_time-Best W_num-Best 1-1-1 1-1-1 1-1-1 arrivals) is implicitly balanced by a fully controlled circular assignment among the server nodes that is guaranteed by the Dispatcher of the Web farm. When the workload characteristics change significantly, so that very long services dominate, dynamic routing algorithms such as W should be applied to achieve a more uniform distribution of the workload and a scalable Web site. However, in highly variable Web sites, dynamic algorithms have serious tuning problems which are unknown to static policies. Hence, although necessary in most systems, dynamic policies guarantee scalability only if it possible to tune them as a function of the kind of Web services provided by the site. Figure 7. Static vs. Dynamic policies in scenario A for different combinations of workload..6.5.4.3.2.1 W_time-Best W_num-Best 1-1-1 1-1-1 1-1-1 Figure 8. Static vs. Dynamic policies in scenario B for different combinations of workload. the workload 1-1-1, best performance are achieved by the W num policy as in scenario A. 5. Conclusions Bursts of arrivals and heavy tailed service times are the main motivations that leaded many Web farms to use dynamic scheduling policies. We wanted to investigate which workload characteristics really motivate the overhead for dynamic algorithms to achieve scalable Web farms. We observed that for most Web publishing sites characterized by a large percentage of static information, a static dispatching policy such as guarantee a satisfactory scalability and load balancing. Our interpretation for this result is that a light-medium load (even with some high peaks and burst of References [1] E. Anderson, D. Patterson, E. Brewer, The Magicrouter, an application of fast packet interposing, unpublished Tech. Rep. Computer Science Department, University of Berkeley, May 1996. [2] M.F. Arlitt, C.L. Williamson, Internet Web servers: Workload characterization and performance implications, IEEE/ACM Trans. on Networking, vol. 5, no. 5, Oct. 1997, pp. 631-645. [3] P. Barford, A. Bestavros, A. Bradley, M.E. Crovella, Changes in Web client access patterns: Characteristics and caching implications, World Wide Web, Special Issue on Characterization and Performance Evaluation, Jan.-Feb. 1999. [4] P. Barford, M.E. Crovella, A performance evaluation of Hyper Text Transfer Protocols, Proc. of ACM Sigmetrics 99, Atlanta, Georgia, May 1999, pp. 188-197. [5] A. Bestavros, M. E. Crovella, J. Liu, D. Martin, Distributed Packet Rewriting and its application to scalable server architectures, Tech. Rep. BUCS-TR-98-3, Computer Science Department, Boston University, Dec. 1997. [6] R.B. Bunt, D.L. Eager, G.M. Oster, C.L. Williamson, Achieving load balance and effective caching in clustered Web servers, Proc. of 4th Int. Web Caching Workshop, San Diego, CA, April 1999. [7] V. Cardellini, M. Colajanni, P.S. Yu, Redirection algorithms for load sharing in distributed Web server systems, Proc. of IEEE 19th International Conference on Distributed Computing Systems, Austin, Texas, June 1999. [8] Cisco s LocalDirector. Available online at http://cio.cisco.com/warp/public/751/lodir/index.shtml [9] M.E. Crovella, A. Bestavros, Self-similarity in World Wide Web traffic: Evidence and possible causes, IEEE/ACM Trans. on Networking, vol. 5, no. 6, Dec. 1997, pp. 835-846.

[1] D.M. Dias, W. Kish, R. Mukherjee, R. Tewari, A scalable and highly available Web server, Proc. of 41st IEEE Computer Society Intl. Conf. (COMPCON 1996), Feb. 1996, pp. 85-92. [11] A. Feldmann, A. Gilbert, W. Willinger, T.G. Kurtz, The changing nature of network traffic: Scaling phenomena, Computer Communication Review, vol. 28, no. 2, 1998. [12] G.D.H. Hunt, G.S. Goldszmidt, R.P. King, R. Mukherjee, Network Dispatcher: A connection router for scalable Internet services, Proc. of 7th Int. World Wide Web Conf., Brisbane, Australia, April 1998. [13] D. Krishnamurthy, M. Litoiu, and J. Rolia, Performance Stress Conditions and Capacity Planning for E-Business Applications, Proc. of International Symposium on Electronic Commerce, Beijing, People s Republic of China, May 1999. [14] T. Schroeder, Steve Goddard, B. Ramamurthy, Scalable Web server clustering technologies, IEEE Network, May- June 2, pp. 38-45.