Performance Analysis of Session-Level Load Balancing Algorithms

Performance Analysis of Session-Level Load Balancing Algorithms Dennis Roubos, Sandjai Bhulai, and Rob van der Mei Vrije Universiteit Amsterdam Faculty of Sciences De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {droubos, sbhulai, mei}@few.vu.nl Abstract Load balancing (LB) is crucial for the efficient operation of big server clusters. In the past, many different LB strategies on the request level have been developed with great effectiveness in parallel applications. However, the LB problem is not yet solved completely; new applications and architectures require new features. In particular, secure environments require that LB is done at session level instead of request level; that is, once a session has been assigned to a server, all subsequent service requests are directed to the assigned server. Despite the fact that many commercial products have been brought to the market to implement LB at the session level, little insight has been obtained into the efficiency of such session-level LB algorithms, leaving ample room for performance improvement and optimization. Motivated by this, we study session-level LB with a focus on algorithms that are simple and easy to implement in real systems. The performance of the load balancer is highly dependent on the request profiles of the different sessions and the information that is available for decision making. We make this trade-off between the information that is available to the load balancer and the efficiency of the algorithm explicit by developing new algorithms, and compare their efficiency with existing algorithms. The algorithms are mainly based on the load of each server and the number of active sessions running on them. Extensive validation in an experimental setting shows that our algorithms outperform the existing ones, and as such, provide a simple, easy-toimplement yet effective means to improve the efficiency of large server clusters. Keywords: Load balancing, performance evaluation, session-level load balancing. 1 Introduction Many content-intensive applications have scaled beyond the point where a single server can provide adequate processing power. This raises the need for flexibility to deploy additional servers quickly and transparently to end-users. Load balancing (LB) is the process of distributing service requests across a group of servers. It has emerged as a powerful solution that addresses several requirements that are becoming increasingly important in computer networks, such as increased scalability, high performance, and high availability and disaster recovery. 1

Load balancing makes multiple servers appear as a single server by transparently distributing user requests among the servers, and thus creates scalability. The high performance is achieved by directing service requests to the servers that are least busy and therefore capable of providing the fastest response times. The improvement in application availability occurs when LB automatically redistributes end-user service requests to other servers within a server farm when a server fails. Moreover, it improves security by protecting the server farm against multiple forms of DoS (Denial of Service) attacks. The literature on the performance and effectiveness of LB algorithms is widespread. We refer to [11] for an excellent overview of the available LB techniques applied in different application areas. However, the vast majority of the performance-related papers on LB that have appeared are focused on request-level LB (also often referred to as server LB), i.e., where each individual request of a client may be directed to a different server, or service (cf. [8] for an overview of request-level LB algorithms). A main disadvantage of request-level LB algorithms is that they are highly vulnerable to unsecured transactions. In secure environments where confidentiality of data and integrity of the network is of high importance, it is crucial that LB is done at the session level instead of request level. Unauthorized data queries should be kept separate from accessing the data of clients directly. Therefore, a farm of terminal servers can serve as an intermediate layer to the outside world so that clients should request a session for all activities for outgoing data traffic. In these cases, the LB algorithm is carried out only when a user requests a new session. Once the session has been assigned to a server, all subsequent service requests generated in this session are directed to this server. Note that it is not desirable to switch this session to a different terminal server due to large overhead and switching times. This creates additional complexity for LB algorithms, since, e.g., the future requests during an active session are not known. Motivated by this, a wide variety of commercial session-level LB products have been brought to the market (see, e.g., [4, 3, 5]). Rather surprisingly, however, despite the large number of available products, relatively little is known about the efficiency of these session-level LB algorithms. These observations have raised the need for studying and optimizing the effectiveness of session-level LB algorithms. In this paper, we study load balancing of sessions on a farm of terminal servers. Whenever a new session is requested by a client, the load balancer needs to assign it to one of the available terminal servers. The sessions remain active for as long as the clients do not terminate their sessions. Hence, all activities (e.g., browsing the web, opening files) by a client induce load on the terminal server on which the session is assigned to. We focus on load-balancing algorithms that are easy to implement in real systems. Since the load balancer has little information on future requests to arrive, the algorithms are mainly based on the load of each terminal server and the number of active sessions running on them. We make this trade-off between the information that is available to the load balancer and the efficiency of the algorithm explicit by developing new algorithms, and compare their efficiency with existing algorithms. We show that our algorithm outperforms the existing ones through extensive validation in an experimental setting. Therefore, we provide a simple, easy-to-implement yet effective means to improve the efficiency of large server clusters. 2

T 0 0 1 T 1 CL 1. CL i. CL I LB T S 1. T S j. T S J Figure 1: Configuration of the system under study. The paper is organized as follows. In Section 2 we present our model for the behaviour of the clients and the terminal servers. The LB algorithms used by the load balancer are treated in Section 3. These algorithms are validated in Section 4 in an experimental environment. Next, we use the model to evaluate other LB algorithms for several client and session profiles in Section 5. Finally, we conclude the paper in Section 6. 2 Model The configuration of the system under study consists of a set of clients {CL i i = 1,..., I}, a load balancer (LB), and a set of terminal servers {T S j j = 1,..., J} as depicted in Figure 1. Each client CL i can be in a state s {0, 1}, representing whether the client has an active session or not. The time a client stays in state s is modeled by the random variable T s which has a general distribution. This process models clients in the system that start a session after a period of T 0 time units have elapsed, and terminate sessions after an active period of T 1 time units. This process resembles the on-off process, which is a well-known model for capturing the long-range dependence in traffic models. As soon as a client requests a new session, the load balancer decides which terminal server to assign the session to. The clients send requests during an active session, which are directed to the terminal server on which the session is active. The requests that are generated during a session are jobs that generate workload for the servers. We distinguish K different types of requests that are generated by the clients. The time between two successive requests has a general distribution. The job size is represented by the random variable X k, which has a general distribution. The processor time that it takes to complete a job of type k is a function f k of X k. The terminal servers serve each job according to a discipline where each of the active jobs receive a fraction of the processor capacity. To calculate the fraction of processor capacity that each job receives at a particular moment, consider a terminal server with N processors and n jobs running on its system simultaneously at that time. Then we determine the fraction of processor capacity according to the following steps: 3

type k CL 1 type k + 1 time t T 1 T 0 T 1 CL 2 time t Figure 2: Example of two clients. The big dashed boxes are sessions with session length T 1 and an inter-session time of T 0. The little solid and dashed boxes are jobs that require f k (X k ) processor time. Step 0 Let assign := 100 N, n_assigned := 0; Step 1 Each job j receives r := assign/(n n_assigned)% of the capacity; Step 2a For each job j that needs a fraction of b% < r%, and which does not have a fraction assigned yet, assign the fraction b and let n_assigned := n_assigned + 1 and assign := assign b; Step 2b If n_assigned = n, done; Step 2c Else, if at least one assignment has been made in step 2a, goto step 1, else goto step 3; Step 3 Assign to all remaining jobs a fraction assign/(n n_assigned)% of the processor capacity. Note that the fraction b in Step 2a above depends on the job type. Therefore, the previous algorithm can lead to two possible outcomes. Either all jobs receive a fraction of the processor capacity that sum up to 100 N, or the jobs do not consume all processor capacity, and thus, the processor remains idle for some fraction of time. For illustration, Figure 2 shows an example with two clients CL 1 and CL 2. Both clients have several sessions over time with several jobs running within each session. Notice that f k (X k ) is the amount of processor time it takes to complete a job of size X k. However, this is not the time that a client necessarily perceives due to processor sharing, since the presence of other jobs in the system can slow down the processing rate for a job of type k. In reality, the processor handles only one task at the time. However, switching between tasks is very fast and, therefore, we can model the handling of all tasks simultaneously at time scales in the order of seconds with each task having a longer service duration. Hence, the use of the processor sharing discipline is well-justified, i.e., clients perceive the processing times to be different than the actual processor time that the task needs. The load balancer uses information about the load of each terminal server to assign a session to one of the terminal servers. We define the load as the number of active jobs, i.e., jobs that are waiting for processor capacity as well as jobs that use processor capacity. 4

The load is calculated every 5 seconds by the system. We denote the r-th update of the load by load(r), which is calculated as follows [10]: load(r) = load(r 1) e 5 60m + n(r) (1 e 5 60m ), with m equal to 1, 5, or 15 for the calculation of the m-minute load average (denoted by m-la), and n(r) the number of active jobs at the r-th update moment. Note that the model has the flexibility to handle multiple client classes as well by changing the probability distribution of T 0 and T 1, and thus effectively creating clients with different session profiles. Similarly, the request profiles can be changed by making the request parameters client dependent. Our objective is to minimize a performance measure that takes into account the differences in load between any of the available terminal servers. Minimizing this difference means better load balancing, while better load balancing implies a better overall response time, as is generally known. Therefore, we consider the average of the absolute differences in the 1-LA between any of the terminal servers as our performance measure throughout the paper. 3 Algorithms In this section, we present load balancing algorithms that are evaluated in our experiments. The first part of this section is devoted to seven main LB algorithms, while the last part considers four methods to do periodic updates of the, so-called weights, that are used by some of the main LB algorithms. In the sequel, we will use T S j for the j-th terminal server, c j for the number of active sessions on terminal server T S j, w j for the weight of terminal server T S j, and l j for the load of terminal server T S j at the moment a session needs to be assigned to a terminal server. 3.1 Main LB algorithms RR Round Robin is a simple, well-known algorithm that does not use the state of the system and is often used when no state information is available. It assigns a new session to server T S (j+1) mod J if the previous session was assigned to server T S j. WRR Weighted Round Robin is a variant on RR that uses weights to create a scheme for assigning sessions to terminal servers. We distinguish between two schemes in assigning new sessions. The first scheme is described below and is implemented in LVS (Linux Virtual Server) [2]: LVS Scheme Step 0 Set v j = w j, and determine the GCD (greatest common divisor) of the weights v j ; Step 1 Divide all v j by the GCD; Step 2 Add the server with index equal to arg max j v j to the scheme and lower v j with 1. Repeat step 2 until all v j = 0. 5

The second scheme is the Golden Ratio method [9], which can be described as follows: Golden Ratio Scheme Step 0 Let M = j w j and φ 1 = 1 2 ( 5 1); Step 1 Order the M numbers 1 φ 1 mod 1, 2 φ 1 mod 1,..., M φ 1 mod 1 from smallest to largest; Step 2 Let the k-th smallest number correspond with the k-th position in the scheme; Step 3 Assign 1 φ 1 mod 1, 2 φ 1 mod 1,..., w j φ 1 mod 1 to the first terminal server, (w 1 + 1) φ 1 mod 1,..., (w 1 + w 2 ) φ 1 mod 1 to the second terminal server, and so on. Both the LVS and the Golden Ratio scheme yield as output a periodic sequence, say (α 1, α 2,..., α P ), based on the weights as input. This sequence prescribes that the i-th session is assigned to terminal server T S f(i) with f(i) = α i mod P. LC Least Connection is a well-known, greedy algorithm in the sense that it assigns a new session to the terminal server with the least number of connections at that time instance. Thus, it assigns a new session to server T S j for which c j = min{c 1, c 2,..., c J }. WLC Weighted Least Connection is similar to LC, but uses weights for the assignment of new sessions. It assigns a new session to server T S j for which c j w j = min { c 1 w 1, c 2 w 2,..., c J w J }. LBA Load Based Assignment is similar to LC, since it assigns to the terminal server with the least load instead of the server with the least number of connections. Thus, it assigns a new session to terminal server T S j for which l j = min{l 1, l 2,..., l J }. WRST Weighted Remaining Session Time balancing tries to infer the future load by looking at the total expected remaining session time and compares this to the actual load. The algorithm does not only take the current load into account, but also an inference of the load in future. Therefore, it could be better to assign a new client to a heavier-loaded terminal server in case this terminal server handles active sessions that will be closed in short times. To our knowledge, an algorithm like WRST has not been mentioned in the literature yet. The precise description of the algorithm is as follows. Let E(RST j ) be the total expected remaining session time of sessions on terminal server T S j. This number can be determined as follows: E(RST j ) = s S j E(RST s current session duration of s = t start time of s), 6

with S j the set of all sessions active on T S j and t the time an assignment has to be made. The conditional expectation can be calculated explicitly, given the (empirical) probability distribution of the session length, by computing the residual lifetime. It assigns a new session to terminal server T S j for which l j E(RST j ) = min{l 1 E(RST 1 ),..., l J E(RST J )}. PR Probability Assignment is similar to RR, however, it uses random assignments instead of deterministic assignments. Moreover, the assignment probabilities are determined by weights that can be updated dynamically. Thus, a new session is assigned to terminal server T S j with probability p j, where p j is given by p j = w j J k=1 w. k 3.2 Update algorithms In this subsection we explain how the weights of some of the main LB algorithms can be updated. We present four update algorithms, LP1, LP2, LOB, and AL. The first algorithm updates the weights proportional to the load on the terminal servers. However, under the LVS scheme, this could create unbalanced assignments, which could potentially lead to a decrease in the LB performance. LP2 alleviates this problem by normalizing the smallest weight to one. The LOB algorithm adopts the approach where the least loaded terminal server receives the highest weight. Finally, the AL algorithm [1] takes not only the load into account, but additionally uses information on the number of sessions assigned to a terminal server as well. In the sequel, we assume deterministic update moments with fixed update intervals. Moreover, by w j (τ) and c j (τ) we denote the τ-th update of the weight of terminal server T S j and the number of sessions on terminal server T S j at the moment of the τ-th update, respectively. Note that the update intervals for the weights can be potentially different from the update intervals for the load. LP1 Load Proportional 1 updates the weights according to w j (τ) = P J k=1 l k l j if l j 0, max { 1, N k=0 l k} if lj = 0. LP2 Load Proportional 2 is the same as LP1 with the exception that LP2 divides each w j by the minimum over all weights. Hence, there is at least one weight equal to 1 while the proportions are maintained. LOB Load Order Based assigns the weights 1, 2,..., J to the descended ordered list of the terminal servers based on the load. 7

AL Aggregated Load considers an aggregated load number AL j and the update of the weights is given by w j (τ) = w j (τ 1) + 5 3 AL AL j, with AL = max { 1, 1 J J } AL k, if 0 w j (τ) 100 and w j (τ) w j (τ 1) 2.5, else w j (τ) = w j (τ 1). The number AL j represents information about the load of terminal server T S j and the number of new sessions assigned to server T S j during the last update interval. Let AL j be defined by AL j = 0.7 l j + 0.3 INPUT j, where INPUT j is given by: INPUT j = { n j 1/J P J k=1 n k if J k=1 n k 0, 0 if J k=1 n k = 0, with n j = ( c j (τ) c j (τ 1) )+. For every update algorithm, we round off all the weights to the nearest integer number so that it is useful for determining a scheme. Furthermore, the combination of the main LB algorithm with an update algorithm is denoted by their names with a dash in between, e.g., WRR-LP1 means that the main LB algorithm WRR is used and the weights are updated according to the LP1 method. 4 Model validation The outcomes of the mathematical model are compared to the experimental results. In an experimental environment with two terminal servers, we configured an LVS load balancer to assign clients to one of the two available terminal servers. The clients were simulated by a couple of PCs. We ensured that the processors of the terminal servers were the bottleneck. We were able to control a variety of parameters, such as the probability distributions for the length of a session and for the time in between. Within an active session, the clients retrieved HTML pages from a predefined set of possible websites. The time in between such requests was specified through a probability distribution. We performed extensive experiments and compared the outcomes of the LB algorithms with a simulation of the model of Section 2. In these experiments, our performance measure that we focused on was the average of the absolute differences of the 1-LA between both terminal servers. For a couple of representative experiments, these results can be found in the first two columns of Table 1. The value between brackets is the standard deviation of the same performance measure. The last column depicts the relative difference with respect to the experimental environment. In the considered cases, it turned out that the performance was within the confidence interval obtained via simulation. Furthermore, the same relative performance was obtained using the mathematical model and the experimental environment. We conclude that our model describes the performance of a real system very well. Consequently, we can use the model for further experiments for evaluating different LB strategies (see Section 5). 8 k=1

Algorithm Simulation Experimental env. Difference (in %) RR 1.9825 (1.5540) 2.0404 (1.3842) 2.8% (12.3%) WLC-LP1 0.6156 (0.6865) 0.5565 (0.4158) 10.3% (65.1%) WRR-AL 2.6987 (2.1346) 2.8300 (2.6669) 6.2% (20.0%) Table 1: Performance obtained via simulation and the observed values in an experimental environment. 5 Numerical results In this section we consider different scenarios and focus on the performance measure, i.e, the average of the absolute differences in the 1-LA between the terminal servers. Hence, all experiments use the 1-LA as load information, unless mentioned otherwise. We show that there is a significant difference between the studied LB algorithms. Furthermore, we show the impact of the size of the update interval on the performance measure and we analyze both the LVS and the Golden Ratio scheme. Suppose the session length T 1 has a lognormal(7.7697, 1.5169) distribution and that T 0 has an exponential distribution with parameter 0.000172. The time unit is in the order of seconds, so, on average, each client starts a new session approximately 90 minutes after the last session was closed by that client. We distinguish between two request types, namely HTML and PDF requests. An HTML request has a file size X 1 in bytes that is distributed according to a lognormal(10.0421, 1.1712) distribution. A PDF request, which is on average larger than an HTML file, has a file size X 2 with a lognormal(11.0709, 2.2349) distribution. The time in between two HTML (PDF) requests is exponentially distributed with parameter 0.033 (0.000054). The functions S k for k = 1, 2 are given by S k (x k ) = 0.0001 x k. Note that the lognormal distribution typically occurs in practice [6, 7]. Moreover, the parameters of the distribution in this setting have been obtained from analyzing data from an operational environment. The average usage of the processor is 80% for HTML jobs and 50% for PDF jobs. We simulate 100 clients, use 2 terminal servers with each having 2 processors. A simulation run consists of 5 independent runs, each having a length of 100,000 seconds and a warm-up period of 10,000 seconds. In the experiments, we update the weights every 150 seconds, unless stated otherwise. The initial weights are equal to 1 for the LP1, LP2, and LOB method, and is equal to 10 for the AL method. For the PR method, we choose all weights equal with no updates, so that PR assigns the sessions uniformly over the terminal servers. Finally, we mention that the WRR algorithm uses the LVS scheme unless stated otherwise. Table 2 shows the average absolute difference between the 1-LA numbers. Three algorithms are performing significantly better than the other two methods. The results could be expected, since the PR (in this case) and the RR method are stateless LB algorithms, i.e., the LB scheme is independent of the actual state of the system. In Table 3, we investigated the influence of the update interval. Most of the time it is best to update the weights frequently. However, there is no general rule that can be 9

Algorithm Performance measure LC 2.35 WRST 2.38 LBA 2.50 PR 3.91 RR 4.09 Table 2: Performance of five main LB algorithms. applied to determine the optimal update frequency. There is a significant improvement in performance when using the update algorithms together with WRR and PR while the performance of LC cannot be increased by huge steps. Table 3 shows that a small update interval is quite good, but using a too small interval performs worse. In particular, a small update interval works well in combination with AL. Update interval (seconds) Algorithm 30 150 300 600 WRR-LP1 3.48 3.43 3.02 2.97 WRR-LP2 3.55 3.18 3.22 3.16 WRR-AL 2.89 3.69 3.96 3.89 WLC-LP1 2.30 2.34 2.42 2.69 WLC-LP2 2.24 2.27 2.42 2.70 WLC-LOB 2.45 2.65 2.88 3.14 WLC-AL 2.95 3.11 3.11 3.44 PR-LP1 3.34 3.06 3.42 3.59 PR-LP2 3.21 3.07 3.09 3.09 PR-LOB 3.09 3.26 3.27 3.44 PR-AL 2.90 3.60 3.87 3.85 Table 3: Performance for different update intervals. By increasing the number of terminal servers and keeping the number of clients constant, we see that the choice of the algorithm becomes less important, i.e., the performance of all algorithms becomes almost the same. The algorithms that use load information, can base their calculations on three load numbers. Using the most updated load information, namely the 1-LA, gives the best performance relative to the 5-LA and the 15-LA. There is a simple explanation for this phenomenon. Big changes in the load are represented faster in the 1-LA than it would be in the 5-LA or 15-LA. It is important to see whether the performance of the algorithms is specific to the chosen system parameters. Therefore, consider the same system, but with 200/250 clients and 4 terminal servers. Table 4 before the double lines gives the results. We see that if the performance was already good, then the performance remains good in the two new 10

100 clients; 2 TS 200 clients; 4 TS 250 clients; 4 TS Algorithm Diff. Ind. Rank Diff. Ind. Rank Diff. Ind. Rank Diff. Ind. RR 4.09 180 16 4.00 173 15 5.29 210 15 1.65 125 LC 2.35 104 3 2.31 100 1 2.52 100 1 1.38 105 PR 3.91 172 15 5.25 227 16 5.76 229 16 2.52 191 LBA 2.50 110 5 2.53 110 4 2.75 109 4 1.59 120 WRST 2.38 105 4 2.45 106 2 2.68 106 2 1.32 100 WRR-LP1 3.43 151 12 2.61 113 6 2.88 114 6 1.64 124 WRR-LP2 3.18 140 10 2.87 124 9 3.90 155 12 1.59 120 WRR-AL 3.69 163 14 3.56 154 14 4.37 173 14 6.33 480 WLC-LP1 2.34 103 2 2.58 112 5 2.82 112 5 1.59 120 WLC-LP2 2.27 100 1 2.52 109 3 2.69 107 3 1.66 126 WLC-LOB 2.65 117 6 2.74 119 7 3.15 125 7 2.21 167 WLC-AL 3.11 137 9 2.86 124 8 3.35 133 9 2.31 175 PR-LP1 3.06 135 7 3.26 141 12 3.80 151 11 1.91 145 PR-LP2 3.07 135 8 3.09 134 11 3.63 144 10 1.92 145 PR-LOB 3.26 144 11 3.02 131 10 3.29 131 8 1.91 145 PR-AL 3.60 159 13 3.37 146 13 3.91 155 13 3.80 288 Table 4: Performance for different situations. situations. Now, consider a totally different situation in which we make the following changes to the original situation: only HTML requests remain, the length of a session is uniformly distributed on [300, 1800], and there are 250 clients. From the last two columns in Table 4, we may conclude that an update algorithm can be better avoided, since in most of the situations it is worse than just RR or LC. In the description of the WRR method, we identified two schemes (the LVS and the Golden Ratio Scheme) to assign clients to the terminal servers. To test which one of them is better, we considered the original system, but now with 150 clients and 2 terminal servers. Table 5 shows the average absolute difference between the 1-LA for both schemes as well as for the PR method. We conclude that the LVS method works best in combination with LP1, while the Golden Ratio method works best in combination with all remaining update algorithms (i.e., LP2, LOB, and AL). However, when we compare the performance of PR in combination with LP1 and LP2, we conclude that PR performs better than LVS or Golden Ratio. We modify the situation in which there are 50 users of type 1 and 50 users of type 2. Type 1 clients can only retrieve HTML pages, while type 2 users can only send PDF requests. All other settings remain the same. It is then observed that WRST also performs better than LC. WRST has an average absolute difference equal to 0.77, while LC has a value equal to 0.93. 6 Conclusion In this research, our objective was to find easily implementable algorithms to balance the load over the available terminal servers, since this yields better response times. We have seen that three algorithms perform very well in general: WRST, LC, and LBA. The 11

Algorithm WRR-LVS WRR-Golden Ratio PR LP1 5.9 6.14 4.94 LP2 5.93 4.42 4.38 LOB 3.11 3.08 3.54 AL 5.09 4.31 4.76 Table 5: Performance for three different methods. performance of RR can be improved significantly, e.g., by using a mechanism to update the weights of WRR periodically. However, caution must be paid to using update mechanisms, since the performance does not improve in all situations. Furthermore, there is no general rule that states how to choose the size of the update interval. Moreover, we observed that using that 1-LA provides the best load information. LB algorithms can be divided into two groups, static and dynamic algorithms. The performance of static algorithms is poor, due to the fact that these algorithms do not take state information into account. Load balancing can be done better by using additional information on the terminal servers. Information on the usage of a terminal server, expressed in the load, and information on the session lengths turn out to be useful. This can be seen from the performance of our developed algorithm WRST that does use this information. It outperforms the existing methods; compared to LC, it performs equally well and in most cases even better. Furthermore, we looked at two possible schemes for server assignments to use in combination with WRR. It is better to make a scheme according to the Golden Ratio method. This scheme yields balanced sequences of server assignments for a given ratio of weights. This is in contrast to the LVS method, where it is relatively easy to overload an underloaded server when all new sessions are assigned to the underloaded server. References [1] http://kb.linuxvirtualserver.org/wiki/dynamic_feedback_load_balancing_scheduling. [2] http://kb.linuxvirtualserver.org/wiki/weighted_round-robin_scheduling. [3] http://msdn2.microsoft.com/en-us/ms972338.aspx. [4] http://www.icmgworld.com/corp/k2/k2.loadbal.asp. [5] http://www.systinet.com/doc/ssj-65/ssj/administration_load_balancing.html. [6] V.A. Bolotin. Telephone circuit holding time distribution. The Fundamental Role of Teletraffic in the Evolution of Telecommunications Networks, pages 125 134, 1994. [7] V.A. Bolotin, Y. Levy, and D. Liu. Characterizing data connection and messages by mixtures of lognormal distributions on logarithmic scale. Teletraffic Engineering in a Competitive World, pages 887 894, 1999. 12

[8] T. Bourke. Server Load Balancing. O Reilly Media, 2001. [9] O.J. Boxma, H. Levy, and J.A. Weststrate. Efficient visit orders for polling systems. Performance Evaluation, 18:103 123, 2003. [10] N.J. Gunther. Analyzing Computer Systems Performance using Perl::PDQ. Springer- Verlag, 2004. [11] C. Koppaparu. Load Balancing Servers, Firewalls and Caches. John Wiley & Sons, 2002. 13