Dynamic Load Balancing for Web Clusters

Dynamic Load Balancing for Web Clusters M. Adamou 1, D. Anthomelidis 1, K. Antonis 2, J. Garofalakis 2, P. Spirakis 2 1. Systems Design Research Lab (SDRL), Dept. of Computer & Information Science, Univ. of Pennsylvania 200 South 33rd St., Philadelphia, PA 19104-6389 Fax: (215) 898-0587, (215) 573-3573, Phone: (215) 898-8090 Email: {adamou, anthomel}@gradient.cis.upenn.edu 2. Computer Technology Institure (CTI) Kolokotroni 3, 26110, Patras, Greece, P.O.Box 1122 Fax: (+30)61-222086, Phone: (+30)61-273496 Email: {antonis, garofala, spirakis}@cti.gr Abstract The wide growth of Internet users has led the developers of popular Web sites to adopt the solution of substituting the Web servers by clusters of Web servers, in order to correspond efficiently to the high rate of requests received. They also applied load balancing techniques to distribute the workload among them. In this paper we present a general design of load balancing strategies in a cluster of Web servers, which uses a proxy per server to collect the incoming requests. To accomplish the best possible workload distribution, we use an update process, which can either be centralized or distributed. We apply two sender-initiated load balancing algorithms, based on the distinction between the centralized and the distributed update process, and analyze their performance results. We conclude that the distributed case behaves better under high loaded conditions. Keywords: dynamic load balancing, web cluster, proxy server, cental & distributed update process Introduction Motivation - The Problem The growth of Internet is very impetuous during the last years. A lot of services like email, ftp, and mainly the World Wide Web (WWW) are accessible via Internet. The development of search engines with a graphical user interface, led to an exponential growth of information accessible by the Internet users. The result of this development of WWW is that many popular Web sites receive thousands of file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (1 of 15) [1/5/2000 3:29:56 PM]

requests per second. In these cases, the response times per request are very high, while sometimes it is impossible to access these sites, because of the server's high workload and the high network traffic. For this reason, the developers of these popular Web sites suggested the solution of substituting the Web servers by groups (clusters) and the application of load balancing techniques to uniformely distribute the load among them. Load balancing is a policy which exploits the communication facility between the servers of a system, by using the exchanging of status information and jobs between any two servers of a system, in order to improve the performance of the whole system. The distribution of load is accomplished with the transferring of jobs from heavily loaded servers to lightly loaded ones. Load balancing techniques consist of two policies: the transfer policy and the location policy [Dan95]. The desicion for a job transfer is taken when a corresponding condition is satisfied, according to the transfer policy (e.g. the number of waiting jobs of a server exceeds an upper threshold). The choice of the subject server to receive a job for remote execution depends on the location policy used (e.g. the server with the lower number of waiting jobs). Most efficient algorithms for load balancing are algorithms which use the current or more recent information about the system behaviour, and are called dynamic or adaptive. There are also algorithms called static which use an a priori known average system information. The desicion for a job transferring can be taken by congested (highly loaded) servers (use of sender-initiated techniques) or by lightly loaded servers (use of receiver-initiated techniques) [Dan95]. [ELZ86a, ELZ86b] proved that receiver-initiated policies perform better under high load conditions and sender-initiated policies perform better under low and mediate load conditions. Some techniques proposed in the past combine the good characteristics of sender and receiver-initiated techniques, and are called symmetrical [AGS98, SK90]. The first approaches for balancing the load of Web servers didn't use the above techniques. Actually, the first two approaches were caching and mirroring. The idea of caching applied on the Web to reduce the delay times, especially at peak moments. The first approach of caching was the installation of a local disk on clients and/or in-memory cache on Web browsers. Soon, the concept of hierarchical memory extended, to consider Web servers as an additional external memory layer. As the effectiveness of caching depends on the number of times a document is requested, it was obvious that it would be more beneficial to share cache memory between different users. The second type of caching which is currently used on the Web is the caching proxy. The proxy operates as an intermediate between the user and the outside world. From the user's view a proxy operates as a server: every request is sent and responded by the proxy. From the Web server's view, the proxy operates as a client, since it forwards the requests to the Web server. Every proxy has a cache memory and this leads to a lot of problems that have to be solved (e.g. which is the best file distribution strategy on cache, which are the files that have to be placed on cache, etc). Another solution of balancing the load is mirroring. In this case, copies (replicas) of a Web server are created and placed in geographically different regions. Every copy is called a mirror and mirrors cooperate to balance the load. The main aim of this scheme is to transfer the responsibility of choosing the appropriate mirror site to the users, in order to send their requests to the closer geographically placed mirror site. A cluster of Web servers is a group of Web servers, like mirrors, except that they are all placed in the same geographical region. Such solutions have been adopted in the near past by sites which receive file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (2 of 15) [1/5/2000 3:29:56 PM]

millions of requests per day ([KBG94], Gar96]). The Web servers of such a cluster cooperate to serve the clients' requests as if the cluster were a unique server. The cluster has a unique name which is known to the outside world, but the servers consisting it have different IP addresses. Obviously, every request is serviced by only one Web server. This is accomplished with the use of e.g. the DNS round robin technique, in which the DNS server responds to every client's request with one of the IP addresses of the servers consisting the cluster, in a circular manner. This technique does not take into account the load of its server or its availability. This technique is effective when HTTP requests access HTML information of relatively uniform size and the load and computing powers of workstations are relatively comparable. But it can not predict dynamic changes of system load and configuration (e.g. when the computing powers of servers are heterogeneous) [AYI97]. Another weakness of this technique is the degree of name caching which occurs. DNS caching enables a local DNS system to cache the name-to-ip address mapping, so that most recently accessed hosts can quickly be mapped. The downside is that all requests for a period of time from a DNS server's domain will go to a particular IP address [MFM95]. Except of the DNS round robin, there are several techniques that have been developed in the past for the distribution of load on a cluster of Web servers. The HTTP redirection is the simplest one: a heavily loaded Web server responds to a client with an HTTP redirection code and provides the address of another Web server, where the client can send its requests ([AYHI96]). According to the Magic Router technique ([APB96]), every request is collected by a central router, whose IP address is known to the clients. Consequently, the router transmits the requests, sequentially, to one of the servers in the cluster. The request transmission is performed by changing the destination's IP address in all IP packets reaching the router to the selected server's address. The packets transmitted from the server to the clients pass via the router. The router changes the IP address of the server in order to prevent the user to understand the answer's redirection. The TCP router technique is similar to the Magic Router, except of two basic differences: first, the server responds straightly to the client, without the central router's involvement. But, it requires the changing of the source's IP address to be done in the TCP/IP level of the server, and this requires the appropriate programming of the kernel of every server in the cluster. Secondly, the TCP router selects the server to execute a suitable request, considering the connections of all servers. [BCLM98, BA99] present the Distributed Packet Rewriting (DPR) technique to balance the load in a cluster of Web servers. DPR uses a similar idea with the TCP router. They only differ in finding the IP address of the end server. This IP address is found with a distributed manner. Every server has additionally the responsibility to redirect requests if the load has to be balanced. The requests arrive in any host in the cluster with the use of the DNS round robin technique. The host is responsible to either serve the request or to redirect it to another host for remote execution. Every host serves an arrived request if the number of its TCP connections does not exceed a particular threshold. Otherwise, the request is transmitted to the host with the minimum number of TCP connections. The selected host serves the client's request using the initial host's IP address as the source address, and the client continues to send its packets to the initial server without knowing that they are redirected. The update information (number of TCP connections) to all servers in the cluster is sent by every server in it, with a broadcast. This broadcast happens in fixed periods of time. In this work we present a general design of load balancing strategies in a cluster of Web servers. This file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (3 of 15) [1/5/2000 3:29:56 PM]

general design defines a proxy per server in the cluster to collect the requests reaching the corresponding server of the cluster, and an update process (per server or central) for the propagation of each server's status information. We analyze the special role of a proxy server and an update process in this scheme and apply two different dynamic, sender-initiated load balancing techniques, using the above general design and evaluate their performances. The first is the fully distributed strategy and the second the centralized scheduler strategy. The fully distributed technique seems to operate better under high loaded conditions. Background [DKMT96] proposes a hybrid scheme which considers the DNS round robin technique with the TCP routing technique. The work of [CC96] is an example of a mirroring technique. To make dynamic server selection practical, they demonstrated the use of three tools: the round-trip latency (considers the number of hops), bprobe (estimates the maximum possible bandwidth along a given path) and cprobe (estimates the current congestion along a path). They showed that dynamic server selection consistently outerperforms static policies by as much as 50%. Furhtermore, they demonstrated the importance of each of their tools in performing dynamic server selection. [BCLM98, BA99] present the Distributed Packet Rewriting (DPR) technique. Actually, they describe the implementation of four variants of DPR and compare their performances. They showed that DPR provides performance comparable to centralized alternatives, measured in terms of throughput and delay. Also, they showed that DPR eliminates the performance bottleneck exhibited when centralized connection routing techniques are utilized. The authors used the SURGE generator tool [BC98] to produce the appropriate number of requests. The SURGE tool generates references matching empirical measurements of 1) server file distribution, 2) request file distribution, 3) relative file popularity, 4) embedded file references, 5) temporal locality of reference, and 6) idle periods of individual users. [AYHI96, AYI97] investigate the issues involved in developing a scalable World Wide Web (WWW) server called SWEB on a cluster of workstations. The scheduling component of the system actively monitors the usages of CPU, disk I/O channels and the interconnection network to effectively distribute HTTP requests across processing units to exploit task and I/O parallelism. It analyzes the maximum number of requests that can be handled by the system and presents several experiments to examine the performance of this system. [ZSY99a, ZSY99b] propose a scheduling optimization for a Web server cluster with a master/slave architecture which separates static and dynamic content processing. The experimental results show that the proposed optimization using reservation-based scheduling can produce up to a 68% performance improvement. [AYIE98] studies runtime partitioning, scheduling and load balancing techniques for improving performance of online WWW-based information systems such as digital libraries. The main performance bottlenecks of such a system are caused by the server computing capability and Internet bandwidth. The authors observed that a proper partitioning and scheduling of computation and communication in processing a user request on a multiprocessor server and transferring some computation to client-site machines can reduce network traffic and substantially improve system response time. So, they presented a partitioning and scheduling mechanism that adapts to resource changes and optimizes resource file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (4 of 15) [1/5/2000 3:29:56 PM]

utilization and demonstrates the application of this mechanism for online information browsing. Our Work In this work we propose a general design to balance the load in a cluster of web servers. According to the proposed architecture every server in the cluster has a proxy server to collect the requests arriving in the web cluster from clients. Incoming requests select the initial proxy server, with the use of the DNS round robin technique. The decision for the target web server to serve an incoming request - the service can be local (in the initial server) or remote (in another server in the cluster) - is taken by another process, the update process. We present here two different dynamic, sender-initiated techniques for load balancing. The difference lies on whether the update process is distributed or centralized. In the first case, every server has an update process to consult for the most suitable server to serve an incoming request (the fully distributed approach), while in the other case, the update process lies on a single server (the centralized update process approach). Every update process should be informed about the most recent information in the whole cluster and keeps information about the TCP connections established for every server in the cluster, whatever the used technique is. Every time, the target node selected to serve an incoming request is the one having the lowest number of established TCP connections. The fully distributed case (where the update process is distributed) is a variation of the Distributed Packet Rewriting (DPR) technique ([BCLM98, BA99]), but it has the advantage against DPR that the programming task is done in the application layer, while in DPR the authors modified the kernel of the corresponding operating system, to enable the broadcasting facility. Furthermore, we examined the performance of the two different approaches (the distributed and the centralized update process cases) and concluded that the fully distributed approach is more beneficial under high workload conditions, but both are beneficial against the no load balancing case.. The Proposed General Design Design Principles The basic goal of load balancing techniques in a cluster of Web servers is that every request will be served by the more lightly loaded host in the cluster. We define the "load" of each server as the current number of the TCP connections the server has established. The general design strategy proposes the creation, in the application layer, of two external processes that will operate transparently to Web servers and clients, and will be responsible for the disemination of requests in the cluster. The DNS round robin technique allocates the requests to servers, and every server has the ability to either serve the request or migrate it for remote execution, in another server in the cluster. When a request reaches a host, a proxy server which is installed in the same machine with the Web server collects it. The proxy server is responsible to direct the request to the local Web server or to redirect it to another server in the cluster. All communication between the host and the client is handled by the proxy server. The proxy server informs the second proposed external process, the update process, about an arrival, a departure or a redirection of a request. The update process knows the workload of every server in the cluster, and it is installed either on every server or on a unique server in the cluster, depending on the load balancing applied technique (the fully distributed and the partially distributed technique, correspondingly). The update process is responsible to inform the proxy server about the more lightly file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (5 of 15) [1/5/2000 3:29:56 PM]

loaded Web server in the cluster. In the following, we describe the algorithm executing when a request arrives to a proxy server via the DNS round robin technique: 1. The request arrives to a proxy server. 2. The proxy server informs the update process (local or central) about the arrival and requests from it the IP address of the Web server that will serve the request (the more lightly loaded server). 3. The update process responds to the proxy with a message containing the requested IP address and port. 4. The proxy directs the request either to a) the local Web server or b) to another Web server in the cluster, depending on the update process' response. 5. The Web server and the client continue to exchange information via the initial proxy server for the request's service. 6. When the service of a request is completed the proxy server sends an appropriate message to the update process. Figure 1: The transactions for an incoming request. Figure 1 illustrates the above algorithmic steps schematically. In this figure we refer to the web cluster node containing the update process, which as it will by analyzed in the following, can be centralized or distributed. The great advantage of this general design, additionally to its transparency to the users, is that the programming procedure is done in the application layer, and not in the kernel of the operating system (as in DPR [BCLM98, BA99]), to guarantee the simplicity, the independency from the operating system, and the greater portabiblity of applications. DPR uses programming in the kernel of the operating system to file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (6 of 15) [1/5/2000 3:29:56 PM]

implement the broadcasting procedure. Moreover, the web server is occupied only with the task of serving the incoming requests, while the proxy and the update processes have the task to make only easy computations and to send/receive messages. Every server selection for remote execution is examined according to the most recent workload information, which is as valid as possible, according to the technique used. So, two requests arriving simultaneously, will not be served by the same web server. The special role of the proxy server It is obvious from the above algorithm, that the proxy server behaves as a Web server to the clients, since it collects their requests, and as a client to every server in the cluster, since it forwards the clients' requests. For this reason, it has to "listen" to the port where the clients send their HTTP requests and direct them to the suitable IP address of a server in the cluster and in the corresponding port where this server "listens". Furthermore, the proxy server cooperates with the update process, in order to find the most suitable (the more lightly loaded) Web server to serve an arrived request. As described above, the proxy server informs the update process about every arrival with a "new arrival" message, and waits the update process to respond with the address and the port of the Web server selected to serve the request. Then, the data transmission follows. The proxy server is always the intermediate level between the client and the Web server. It transmits the data from the client to the Web server and inversely. When there are no data to transmit or a particular idle period of time passes, then the two connections established (with the client and the Web server) close. When the Web server completes the service of a request, the proxy informs the update process about this completion by sending a message, containing the address of the above Web server, in order to enable the update process to update its information tables. The special role of the update process We consider two different approaches of keeping information about every server's workload in a cluster of Web servers. The first one is the central update process approach, where there is a unique update process installed in a particular server in the cluster, and the second is the fully distributed approach, where the update process is installed to every server in the cluster. In any case, the update process handles a table keeping information about each Web server's number of TCP connections. The continuous update of this table is the main task of this process. The two different approaches belong in the category of sender-initiated techniques. The central update process approach The update process is installed in a machine (in or out of the cluster) to hold the information about each server's current number of established TCP connections, and to take the appropriate load balancing decisions. So, the update process operates as a server of the proxies' requests. Its IP address and port are, obviously, known to every proxy in the cluster. There are two kinds of messages sent by the proxy to the update process: a "request arrival" message and a "request service completion" message. When the update process receives a "request arrival" message, it file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (7 of 15) [1/5/2000 3:29:56 PM]

examines if the Web server, local to the corresponding proxy server sending the message, can serve the request. Specifically, if its corresponding current number of TCP connections is below a specific threshold, the request is served locally. Otherwise, the update process looks in its own information table and decides which is the best suitable server (the more lightly loaded one) to serve the request. Then, it sends a message containing the IP address and the port of the selected Web server, and updates the corresponding records in the information table, increasing the current established connections' number of the selected Web server by one. When the update process receives a "request service completion" message, it reduces the number of current established connections of the Web server, whose IP address is contained in the message. The fully distributed update process approach According to the fully distributed case, the update process is installed in every host in the cluster. Every update process cooperates with the local proxy server to decide which is the more suitable host to serve an arrived request. Every update process communicates with all the other update processes, exchanging their information tables, in order all to be informed about the more recent events in the system. Specifically, when an http request arrives at a proxy server, then the proxy informes the update process about this new arrival. The update process looks in the information table and finds the more suitable Web server in the cluster. The subject server's selection is done as in the centralized case, using the same threshold. The update process informes its records in the table about the new arrival and forwards the information about the new arrival on the selected Web server to all the other update processes in the cluster, by multicasting. Finally, the proxy server sends the request to the selected Web server. The update processes should also be informed in the case of a request service completion. The steps followed to perform this task are the same as in the new request arrival case. Performance Analysis The Experiment We applied the proposed general design of load balancing strategies in a cluster of 5 Web servers, connected on a local area network, and evaluated the performance of the two above different, dynamic, sender-initiated approaches on this testbed. To compare these two different approaches to the problem of balancing the load in a cluster of Web servers, we considered the following criteria: 1. the percentage of rejected requests, 2. the total time for the experiment execution (for the same number of generated requests), 3. the throughput of the system in serviced requests per second, 4. the transfer rate in kb/s, and finally 5. the mean service time in the Web cluster for a request. The web cluster platform used to analyze the performance of the above two different techniques includes 5 different UNIX machines running the SOLARIS operating system. Specifically, includes two Ultra Enterprise 3000 machines containing two CPUs running at 166 MHz, having 256MB RAM memory and an 100Mbps Full duplex link, one Ultra 450, containing four CPUs running at 300MHz, having 1GB file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (8 of 15) [1/5/2000 3:29:56 PM]

memory size and an 100Mbps network card and two Sparc 4 machines, with 32MB memory and one CPU running at 70Mhz, and having a 10Mbps network card. The requests that have to be serviced by the Web cluster are generated with the use of the ApacheBench (version 1.2) tool. This tool is very simple in using and it was selected because a user can control the number of produced requests. Specifically, the ApacheBench produces a constant number of requests, defined by the user as a parameter. The parameters inserted in this tool to produce the flow of requests in the system are: 1. the number of requests, 2. the number of clients, and 3. the requesting file size. The produced requests are delivered to the proxies of the cluster with the use of the DNS round robin technique. This means that requests are delivered in a not equally balanced way, according to the disadvantages of the DNS round robin policy, discussed earlier in this paper. In the following, the load balancing techniques are responsible to distribute the requests, in order to avoid the existence of heavily and lightly loaded servers. The threshold value used for each of the two different policies is 0. This value was selected in order to make a high number of redirections to equally balance the load, since every server having e.g. 1 established TCP connection, tries to send any other request for remote execution within the cluster. In this case, the update process will decide the identity of the subject server (according to the load balancing approach used), which could obviously be the server that initially received the request. In each case we examine performance results when : 1. the load varies (the numbers of users and requests vary and all other input parameters are constant), 2. the requested file size is variable and all other input parameters are constant, and 3. the number of servers in the cluster varies and all other input parameters are constant. Comparative Results Tables 1 and 2 contain in two parts the results for a set of experiments, where the load of requests in the system is variable. Specifically, the combination of the number of requests and the corresponding number of clients is different in each experiment. Throughout this section we identify the two different approaches - techniques, the fully distributed and the centralized update process approach, by the FDA and CUPA acronyms, correspondingly. We consider a web cluster of 5 servers and a requested file of a small size. Experiment number servers File size (KB) 1 2 3 4 5 5 5 5 5 5 1.6 1.6 1.6 1.6 1.6 file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (9 of 15) [1/5/2000 3:29:57 PM]

requests 10 10 25 25 50 clients 5 10 5 25 5 Technique FDA CUPA FDA CUPA FDA CUPA FDA CUPA FDA CUPA Redirections 8 8 9 8 12 11 22 23 25 25 Percentage of rejected 0 0 0 0 1 0 2 1 2 0 requests (%) Total execution 2.56 1.3 3.65 0.618 3.7 4.23 3.8 4.87 5.7 8.97 time (sec) Requests/sec 3.91 7.64 2.74 16.18 6.65 5.9 6.57 5.13 8.7 5.57 Transfer rate (kb/sec) 6.57 13.7 4.5 26.26 10.7 10.02 10.6 8.59 13.7 8.9 Mean service time (sec) 0.72 0.12 0.91 0.26 0.4 0.43 0.46 0.79 0.31 0.42 Table 1: Performance comparative results where the input load is variable (part 1). Experiment number 6 7 8 9 10 11 servers 5 5 5 5 5 5 Requested file size 1.6 1.6 1.6 1.6 1.6 1.6 clients 10 25 1 5 10 25 Total number of 50 50 100 100 100 100 requests Technique FDA CUPA FDA CUPA FDA CUPA FDA CUPA FDA CUPA FDA CUPA Redirections 43 35 45 36 14 13 52 50 92 93 95 96 Percentage of rejected 2 0 3 0 0 0 1 1 2 2 3 2 requests (%) Total execution time (sec) 6.57 8.37 7.75 12.2 23.7 28.6 11.4 17.68 8.9 17.71 13.9 16.2 file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (10 of 15) [1/5/2000 3:29:57 PM]

Requests/ sec Transfer rate (kb/s) Mean response time (sec) 7.6 5.97 6.45 4.08 4.2 3.5 8.7 5.65 11.1 5.64 7.1 6.16 12.3 9.69 10.46 6.68 7.86 5.6 14.2 9.17 18.2 9.15 11.5 10.01 0.5 0.6 0.5 1.05 0.13 0.28 0.56 0.67 0.58 0.92 0.69 0.88 Table 2: Performance comparative results where the input load is variable (part 2). As it can be seen from the results presented in the tables 1 and 2, the centralized update process approach performs better under lightly loaded conditions (e.g. when having 10 requests to serve). As the number of requests increases, the fully distributed approach accomplishes better performance results. Specifically, all performance measurements (total execution time, serviced requests per second, transfer rate and mean service time) are getting better as the number of requests increases, whatever the number of clients making these requests is. The reason for this behaviour is that the centralized update process approach spends extra time for two network communications within the web cluster for every incoming request, to decide which server is the best possible solution to serve the specific request, while in the fully distributed approach the above decision happens locally, since the requested update process is local to the proxy where the incoming request arrives. When a web server reaches its upper capacity threshold, all incoming requests chosen by the paticular update process to be processed locally or remotely by such a server are rejected, until the number of currently serviced requests by this server decreases. The percentage of rejected requests is always equal or greater for the fully distributed approach. The reason is that the centralized update process approach makes by its nature the best possible solution. The probability of inconsistensies is greater in the fully distributed approach. Experiment number 1 2 3 4 servers 5 5 5 5 File size (KB) 1500 1500 1500 1500 clients 1 5 10 10 requests 10 10 10 50 Technique FDA CUPA FDA CUPA FDA CUPA FDA CUPA Redirections 0 1 5 6 10 9 10 45 Percentage of rejected requests (%) 0 0 0 0 0 0 2 0 file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (11 of 15) [1/5/2000 3:29:57 PM]

Total execution time (sec) 4.06 5.515 3.6 3.647 3.4 3.8 14.3 17.543 Requests/sec 2.46 1.81 2.75 2.74 2.86 2.6 3.48 2.85 Transfer rate (kb/s) 3874.5 2854.4 4897.9 4869.2 4599.9 4261.7 5233.5 4394.5 Mean service time/ request (sec) 0.4 0.55 0.95 1.109 1.013 1.08 1.76 2.2 Table 3 : Performance comparative results for a big requested file size. When the requested file size increases, the fully distributed approach accomplishes the best performance results, for all workload conditions. Table 3 illustrates the performance results in this case, and shows that they are much better for the fully distributed technique. Experiment number 1 2 3 4 5 servers 5 4 3 2 1 Requested file size (KB) 1.6 1.6 1.6 1.6 1.6 clients 5 5 5 5 5 Total number of requests 25 25 25 25 25 Technique FDA CUPA FDA CUPA FDA CUPA FDA CUPA FDA CUPA Redirections 12 11 17 18 15 15 13 12 0 0 Percentage of rejected 1 0 0 0 0 0 0 0 0 0 requests (%) Total execution time 3.7 4.23 3.2 4.442 2.9 6.168 4.76 3.7 5.6 4.8 (sec) Requests/sec 6.65 5.9 7.8 5.63 8.6 4.05 5.25 6.7 4.45 5.11 Transfer rate (kb/sec) 10.7 10.02 12.6 10.51 13.9 7.57 7.85 12.52 7.22 9.55 Mean service time 0.4 0.43 0.5 0.6 0.5 0.587 0.54 0.5 0.9 0.51 Table 4 : Performance comparative results where the number of servers varies. Table 4 presents the performance results when the number of web servers within the cluster varies. These file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (12 of 15) [1/5/2000 3:29:57 PM]

results show that balancing the load between more than two servers is beneficial in all cases, whatever the technique used is. The results are getting much better when having 5 servers in the cluster. The case having only one web server in the cluster, obviously corresponds to the no load balancing case. The fully distributed case presents better performance results, against the centralized update process case, when the number of servers within the cluster exceeds number 2. In the case of a two - server web cluster the centralized approach achieves better results. Conclusions This work presents a general design of load balancing strategies in a cluster of Web servers. This approach can be used by popular Web sites, to suite efficiently with the wide growth of received requests per second by Internet users. This general design defines a proxy per server in the cluster, which is responsible to collect the requests arriving at the corresponding server of the cluster, and an update process (per server or central) for the disemination of each server's status information. We analyze the special role of a proxy server and an update process in this scheme and apply two different dynamic, sender-initiated load balancing techniques, based on the distinction between a distributed and a centralized update process, using the above general design and evaluate their performances. The first one is the fully distributed strategy and the second one the centralized scheduler strategy. The fully distributed technique seems to operate better under high loaded conditions. References [AGS98] K. Antonis, J. Garofalakis, P. Spirakis, "A Competitive Symmetrical Transfer Policy for Load Sharing", in Proc. 1998 International Euro-Par Conference, pp. 352-355, 1998. [APB96] E. Anderson, D. Patterson, and E. Brewer, "The Magicrouter, an Application of Fast Packet Interposing", OSDI, 1996. [AYHI96] D. Andresen, T. Yang, V. Holmedahl and O. Ibarra, "SWEB: Towards a Scalable WWW Server on MultiComputers", in Proccedings of the 10th International Parallel Processing Symposium (IPPS'96), Hawaii, pp. April, 1996. [AYI97] D. Andresen, T. Yang, V. Holmedahl and O. Ibarra, " Towards a Scalable WWW Server on Workstation Clusters", Journal of Parallel and Sistributed Computing (JPDC), Vol. 42, pp. 91-100, 1997. [AYIE98] D. Andresen, T. Yang, O. Ibarra, O. Egecioglu, "Adaptive Partitioning and Scheduling for Enhancing WWW Application Performance", Journal of Parallel and Distributed Computing, 1998. 49(1), pp. 57-85, 25 February 1998. [BC98] P. Barford and M. Crovella, "Generating Representative Web Workloads for Network and Server Performance Evaluation", ACM SIGMETRICS, pp. 1-17, 1998. [BCLM98] A. Bestavros, M. Crovella, J. Liu, and D. Martin, "Distributed Packet Rewriting and its Application to Scalable Server Architectures", Tech. Rep. BUCS-TR-98-003, Boston University, Computer Science Department, February 1998. [BA99] A. Bestavros, L. Aversa, "Load Balancing a Cluster of Web Servers Using Distributed Packet Rewriting", Technical Project Report, Boston University, Computer Science Department, January 1999. file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (13 of 15) [1/5/2000 3:29:57 PM]

[CC96] R. L. Carter and M. E. Crovella, "Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks", Tech. Rep. BU-CS-96-007, Computer Science Dept., Boston University, Boston, MA, 1996. [Dan95] S. Dandamudi, "The Effect of Scheduling Discipline on Sender-Initiated and Receiver-Initiated Adaptive Load Sharing in Homogeneous Distributed Systems", Technical Report, School of Computer Science, Carleton University, TR-95-25, 1995. [ELZ86a] D.L. Eager, E.D. Lazowska, J. Zahorian, "Adaptive Load Sharing in Homogeneous Distributed Systems", IEEE Transactions on. Software Engineering, Vol. SE-12, No5, May 1986, pp. 662-675. [ELZ86b] D.L. Eager, E.D. Lazowska, J. Zahorian, "A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing", Performance Evaluation, Vol. 6, March 1986, pp. 53-68. [Gar96] S. L. Garfinkel, "The Wizard of Netscape", WebServer Magazine, 1(2):59-63, 1996. [KBG94] E.D. Katz, M. Butler, R. McGrath, "A Scalable HTTP server: The NCSA Prototype", Computer Networks and ISDN Systems, Vol. 27, pp. 155-164, 1994. [MFM95] D. Mosedale, W. Foss, R. McCool, "Administering Very High Volume Internet Services", Proc. of 1995 LISA IX, Monterey, CA, September, 1995. [SK90] N.G. Shivaratri, P. Krueger, "Two Adaptive Location Policies for Global Scheduling Algorithms", IEEE Int. Conf. Dist. Computer Systems, 1990, pp. 328, 335. [ZSY99a] H. Zhu, B. Smith, and T. Yang, "Hierarchical Resource Management for Web Server Clusters with Dynamic Content", ACM SIGMETRICS, pp. 198-199, 1999. [ZSY99b] H. Zhu, B. Smith, and T. Yang, "Scheduling Optimization for Resource-Intensive Web Requests on Server Clusters", ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 13-22, 1999. Vitae Maria Adamou received her Diploma from the department of the Computer Engineering and Informatics, University of Patras, 1999 and she is currently a PhD student at the Computer and Information Science Department of the University of Pennsylvania. Her research interests include distributed real-time systems, mobile and wireless computing. Dimosthenis Anthomelidis obtained his Diploma from the Computer Engineering and Informatics Department of the University of Patras (Greece). He is currently a graduate student at the University of Pennsylvania,USA,in Computer and Information Science. His research interests include computer networks, distributed and mobile computing and software engineering. Konstantinos Antonis received his Diploma from the Department of Computer Engineering and Informatics, University of Patras, Greece, in 1994. He currently is a graduated student in the above department and he is working as an engineer for the Computer Technology Institute (CTI), Patras, Greece. His research interests include distributed computing and especially load balancing in distributed systems and web servers. file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (14 of 15) [1/5/2000 3:29:57 PM]

John Garofalakis received his Ph.D. from the Department of Computer Engineering and Informatics, University of Patras, Greece, in 1990 and his Diploma on Electrical Engineering from the National Technical University of Athens, in 1983. He is currently Assistant Professor at the Department of Computer Engineering and Informatics, and Head of a Research Unit at the Computer Technology Institute, Patras, Greece. His research interests include performance evaluation of computer systems, distributed systems and algorithms, Internet technologies and applications. He has published in various journals and refereed conferences, including the Theoretical Computer Science journal, Performance Evaluation journal, IEEE Internet Computing, the ACM SIGMETRICS conference, WDAG, Euro-Par, etc. He has been a referee for Performance Evaluation and the ACM SIGMETRICS Conference. Paul Spirakis obtained his Ph.D. from Harvard University, USA, in 1982, in Applied Mathematics and Computer Science. He was promoted to Full Professor in the Department of Computer Engineering and Informatics, University of Patras, Greece, in 1990. He is currently the Director and a senior scientist of the Computer Technology Institute (CTI), Patras, Greece. His research interests include probabilistic algorithms, parallel and distributed algorithms and protocols, telematics, exact analysis of algorithms, algorithms and complexity, performance analysis and databases. He has extensively published in most of the important Computer Science journals and most of the significant refereed conferences. He is currently a Member of the Board of EATCS (European Assoc. in Theoretical Computer Science), a consultant of the EU in Informatics and a senior consultant of the Greek State in Informatics for Education, Health, Telematics and the public domain. Member of EATCS, ACM, MAA, AMS. file:///c /WINDOWS/DESKTOP/w9-papers/Performance/www9.htm (15 of 15) [1/5/2000 3:29:57 PM]