Adaptive Parameter Setting for QoS Aware Load Balancing Algorithm

Adaptive Parameter etting for Qo Aware Load Balancing Algorithm KIMMO KAARIO Honeywell Industrial Control Control ystem Development Ohjelmakaari, FIN-45 yväskylä FINLAND TIMO HÄMÄLÄINEN University of yväskylä Faculty of Information Technology Department of Mathematical Information Technology Telecommunications POBox 35, FIN-435 yväskylä FINLAND PERTTI RAATIKAINEN VTT Information Technology Telecommunications POBox 22, FIN-244 VTT FINLAND Abstract: - The swift growth of Internet has boosted the use of Web based services and in some practical cases has led to overwhelming request bursts to servers Relational database queries, image storage/retrieval and other new types of application transactions have become increasingly popular Their coexistence in commercial parallel and distributed systems have generated some uniquely new loading problems For example, the constant increase of request rate finally leads to processing power requirement exceeding that of the accessed server As a consequence, the response times increase and some portion of the requests are lost Clustering of servers to meet the growing demand for server processing capacity, especially in web-based service supply, have created the need for intelligent switching at front-end devices As a consequence of clustering, multilayer switching schemes have been developed to enable optimum loading of the individual servers in a cluster In this paper, we formulate the load balancing problem taking the Qo into consideration and introduce a Qo aware load balancing algorithm (Qo-LB) The performance of the algorithm is simulated and results indicating the load balancing capability of the algorithm are presented The overall idea of this paper is to describe an algorithm that actually provides Class of ervice based differentiated access to server clusters, and offers better playground for Qo mechanisms in client-server environments The engineering task to offer Qo guarantees with such a differentation tool is out of the scope of this paper Keywords: - erver load balancing, content-based switching, Qo

+ + + Introduction The server capacity problem has normally been solved by implementing a cluster of servers having identical or partly identical content, which on the other hand has created the problem of balancing load between the clustered servers Front-end devices, supporting various kinds of load balancing methods, have been developed to direct requests to the servers Most of the experimented as well as implemented load balancing schemes employ sufficiently simple algorithms that have been developed for a specialized hardware and software architecture They usually do not take into account the Quality of ervice (Qo) issues The simplest ones share the load uniformly between the servers by using algorithms like round-robin [2] ome systems consider the processing power of the servers and utilize the weighted roundrobin scheme [3] More intelligent systems take response times into account and try to optimize system performance, eg by maximizing cache hit rate [7] The most advanced of these systems are called web switches operating at layer 5 of the TCP/IP protocol stack, ie the application layer for the Web, and use content of the IP packets in making the load balancing decisions [] Our goal was to develop a load balancing scheme that would fit optimally into a wide class of distributed computer system architectures and take required Qo into account The incoming requests would be directed to servers based on the Qo needs of the requested services and loading level of the individual servers would be tuned to support required Qo levels This means that more processing power is reserved for high priority requests than for lower priority ones The algorithm was introduced in [4], and it was optimized for high cache hit-rates in [5] This paper modifies the algorithm by introducing an adaptive tuning method for its parameters The algorithm is being implemented to Media witch [2], which is a Linux based programmable switch The rest of this paper is organized as follows Chapter 2 introduces the Qo based load balancing problem and Chapter 3 presents the developed algorithm Chapter 4 presents the performance evaluation results and Chapter 5 summarizes the main results and outlines the future work 2 Problem Formulation Let be the number of servers in a server cluster, the number of served Qo classes, and the desired upper limit of load in server, Let row,, of matrix () indicate the current load of Qo class in the cluster at time Now, we can describe the connections of each Qo class in the cluster by a connection matrix (2) where "! #$ % indicates percentage of traffic in Qo class that is currently served by server The following three rules have to be satisfied: & The load on each server is preferred to be less than, which means that there should be some penalty when '( *) is not valid 2& Every customer must be served, ie (3) - / - (4), 3& The requests for a certain Qo class should be served by the same server if possible With this rule we may assume that rule 4& is well-formed This requirement can be achieved, eg by minimizing the product 2 - (5), By using this approach, the minimization procedure tries to find situations, where some of the elements in each row of the connection matrix is close to This leads to a state, where the load for that traffic type concentrates to a single server (with close to ) If any of the elements in that row equals to, the other elements must equal to zero This would be the ideal case, if the following rule (rule 4& ) is minimized at the same time 4& Qo awareness In the following, we assume that the most important Qo class is class and the lowest Qo class gets the highest class number It is also assumed that ) 3 (6) for 4 5 56 With these assumptions we can minimize,, 87 777 6 7 (7) 777

H Q Q H C R > C O H Y XXXX E E L h i i H i H i O i O XXXX to prefer serving the higher Qo classes in the less loaded nodes (ee rule 9 ) Equation (7) behaves as some sort of a pointer to a relevant server for each Qo class - if the chosen server is close to ideal, then :;=<?> A : is small and elementb >C can be large When the distance :;D<-> A : gets larger, the minimization tries to find smaller elements B >C for these servers Now, the problem is to minimize the function EGFIH N B C KML >C P N C KML R >K8L( > B >C < TDU C >V U P ;*<ZY [ A B (8) >K8LW C KML8X >C with constraints XX X B F-\ ] F-\ ] ^^^] [ (9) C K8L >C and B _ C*` Tab c ] ; F-\ ] ^^^] A () wheret ab c is the maximal processing capacity of C server; The problem gets a lot more complicated when we consider the requirement of high cache hit rates and try to serve the same kind of requests by known servers The problem of assigning the servers only by the content of each request is analogous to the introduced Qo based scheduling problem, but combining these two different (and in some ways contradicting) approaches leads to the load matrix (compare to Eq ()) ^ ^ ^ jk Fedf L L L h ^ ^ ^ L i kk ffg h L h h h i () L ^ ^ ^ h l [ i where is the number of Qo classes and m is the number of services (or content types, depending on application of the algorithm) Element indicates now the current load in >n Qo class Y that is produced by requests of typeo Now, ifa is the number of servers, the connection matrix B ["p is an m p A matrix, where B B F df L L C B ^ ^ ^ B L h C B L i C C ffg h L C B ^ ^ ^ h h C B h i C B L C B ^ ^ ^ h C B i C jk kk l (2) and; F-\ ] ^^^] A If we extend the previous problem formulation (Equation (8)), we are faced with the problem of minimizing the following four object functions F H B (3) C K8L >K8L n KML >n C F H B (4) >n C and E qf H C KML E uf n K8L C K8L n KML n K8L r C KML8X XXX at the same time We have also two constraints >n B >n C < TDU >V U Cts (5) ;*<ZY [ A B >n C (6) B F-\ ] (7) C KML >n C for all Y Fv\ ] ^^] [,o F-\ ] ^^^] m and n K8Lw >n B >n C*` T a=b c C (8) for all; F-\ ] ^^^] A The introduced problem is ideal to be analyzed interactively, eg by WWW-NIMBU (Nondifferentiable Interactive Multiobjective BUndle-based optimisation ystem) [] This tool allows the user to choose the importance of each object function during the optimization process, and gives us the possibility to surf between the Pareto optimal solutions of the problem The user (or decision maker) is also needed to check if the solutions are relevant As is obvious, each of the four object functions may achieve optimal solutions that are not necessarily having any practical value from the original point of view The analytical research of the problem is a topic for another paper This paper introduces an algorithm that is based on our proposal in [4] The main merit of this paper compared to [4] is the introduction of adaptive parameter assignment that is introduced in the following chapter Without this feature the algorithm would have been useless in practical implementations

œšš 3 Qo Aware Load Balancing Algorithm In [4] we have introduced a Qo Aware Load Balancing Algorithm (Qo-LB) that scheduled well requests to servers to achieve the Qo based goals described in the previous section This version of algorithm needed, however, some knowledge from the user to tune the parameters to give good results This cannot be applied in practice and therefore the algorithm has been developed a bit further Now, the most important parameters of the algorithm are adaptive (step 4x in the following) and there is no need for user intervention Lety{z be a set of servers in a server cluster,}vz a set of supported services (note that each service must be identified uniquely by a number in this set), and ~z- a set of supported Qo classes in the server cluster (again, Qo classes must be identified by these numbers) If is the number of servers, the number of services, and the number of Qo classes, then y ƒ5 ˆˆˆ Š, }Zƒ ˆˆˆ 8, and~œƒ? ˆˆˆ The algorithm goes as follows: x Initialize the variable Ž An example [4] for initializing this variable to achieve lower loads in the servers that will preferably serve the most important customers (ie the traffic in the best Qo class): We have function 8 to assign each server a maximal Qo class that it is preferred to serve By using this function, the set of preferred Qo classes in server becomes Ž =ƒ? ˆˆˆ 8 Now, we can define 8, eg by using a linear approach 2ž Ÿ w * Œ 8 =ƒ šš 2ž Ÿ when ª (9) = when «ª where ƒe ˆˆˆ Here, means the ceil of, and w is the floor of Linear approach means that the number of preferred Qo classes decreases quite linearly when the server number increases In some cases, however, there might be need for different weighting between the Qo classes In this case, we can try a bit different version of 8, for example 8 /ƒ *±² "ž Ÿ ³" µ ( Go to step x x Try to find a server that is included in the set of preferred servers for the requested service and its Qo class If this kind of server is found and its load is under the limit ¹(º»¼ ºw, choose it as the server for the requested connection and go to step 4x Otherwise, go to step 2x 2x Try to find a server that has a load less than¹(½¾ The limit¹w½¾ must satisfy the rule ¹(½¾ ŠÀ *±² ¹(º»¼ ºw Á z ~ Dz }/ (2) where ~ is the set of all supported Qo classes in the server cluster, and is the set of all the supported services in the cluster If this kind of server is found, choose it as the server for the requested connection and go to step 4x Otherwise, go to step 3x 3x Choose the least loaded node of the cluster as the server for the requested connection Go to step 4x 4x Update¹Mº»¼ º dynamically In this paper, this is done as follows: LetÂ be a subset ofy that includes the servers that are the preferred servers for the most critical Qo classes In these simulations, the most critical Qo classes mean the most important half of the classes If the load in any server in server set Â (let this server be Ã ) is more than "±²» Ä Å(ÆÇ Ä È¹Mº»¼ º É Ê, set ¹Mº»¼ ºw /ƒ "±² ¹ =Ë Ì (2) ¹Mº»¼ ºw "±² ¹Mº»¼ º 4Í Î for all?z~ and for all?zï}, and go to step x Otherwise, set ¹(º»¼ ºw =ƒª 2É Ð/ ¹ Ë Ì ¹Mº»¼ ºw 8³ Ñw (22) for all z ~ and for all Dz } Go to step x In the simulations, we have used values Ñ ƒ,òšƒœó, and Í ƒïó*ô Ã The bigger the value of Ñ is the faster ¹Mº»¼ º decreases when there is a period of low load Correspondingly, the bigger the values for Ò and Í are the faster the algorithm reacts for traffic bursts Here,¹ Ë Ì means the maximum processing capacity of the servers, and¹(º»¼ ºw the user defined limit for relevant maximum load in the server for request of service that belongs to Qo class In the algorithm, the maximal number of concurrent connections to the back-end nodes is limited to ¹(È ÕÖ ¹ =Ë Ì Ø

4 imulation Results Normalized load of the servers using our algorithm (linear f(m), dynamic T high ) In most of the Matlab [] simulations, inter-arrival times for the requests were created by using a simple Poisson process The service times were usually created as a Pareto process In order to create self-similar traffic patterns, using heavy tailed service time distributions is sufficient However, in some sets of the simulations, we also used self-similar interarrival times The reason for this was to get more information of the behaviour of the algorithm under quite heterogeneous set of load patterns The existence of self-similarity in network traffic can be studied in more detail, eg from [6, 8] and [9] imulations revealed that the problem of mapping all the pairs Ù Ú Û Ü Ý of the requests optimally to the servers is quite difficult, if we always want to have high cache hit-rates, and to treat Qo classes unequally (ie use the servers with lower response time for the best customers) at the same time The problem is so interesting that we still have intensive research going on to improve the mapping scheme that will try to satisfy these contradicting goals in a way that is close to the optimal solution ome considerable improvements have been obtained since [4], and the only Qo based assignment works already quite well This can clearly be seen in Figures and 2, which illustrate performance of a cluster of four servers Figure 2 demonstrates how Qo-LB works, ie assigns unequal load to servers according to Qo classes As a reference, Figure shows similar performance curves for a round-robin scheme In Qo-LB, the server number 4 is assigned as the preferred server for the most important customers, and as the Qo class gets worse, the number of preferred server gets smaller When looking at the figures, you should note that curves start from an empty system, and the load is not stable in the first half of the simulation run (confidence problems get bigger when using heavy-tailed distributions) erver erver 2 erver 3 erver 4 5 5 5 5 Normalized load of the servers using Round Robin 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7 Time Figure : Normalized load of the servers using Round-Robin erver erver 2 erver 3 erver 4 5 5 5 5 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7 Time Figure 2: Normalized load of the servers using Qo-LB with dynamic ÞMß à á ß (Eq (9) was used in the initialization) A major improvement towards real implementations of the algorithm was that the introduced dynamic setting of ÞMß àá ß seemed to work as expected (Þ(ß àá ß follows the actual load) This can not be seen from the figures as clearly as the Co differentation When developing the dynamics ofþ(ß àá ß, the main issue was to find relevant values forâ,ã, andä (see Eqs (22) and (23) in step 4å of the algorithm) As mentioned before, we finally decided to use values â-æç, ãœæ5è, and ä2æœèdé ê As the need for this kind of dynamics differs depending on the application of the algorithm, it may be worth of trying different values for these three parameters This is one topic for further study 5 Conclusions Due to the fast growing demand for web-based services, methods to increase server processing capacity have been studied intensively Clustering of servers have been a straightforward way to go, but scheduling of the incoming requests between the servers have not been an equally clear task A frond-end device, often called a multilayer or content-based switch, is needed to carry out the request scheduling A number of different scheduling schemes have been developed and the most novel of them utilize information from the link layer up to the application layer In this paper, we have studied the issue of load sharing within a cluster of servers and introduced an adaptive tuning method for Qo aware load balancing scheme (Qo- LB) First, a mathematical formulation for the load balancing problem to optimize system performance as a function of the required Qo classes and the content of the requested services has been developed econd, an algorithm to tune server loading levels to gain Qo aware total system performance has been introduced Results obtained by running a

number of simulation cases have shown that the developed algorithm works as planned However, the developed algorithm doesn t consider the cache hit rates well enough New versions of the algorithm [5] experiment some advanced features to solve this weakness, but the main work to be done is to find and implement an algorithm that more closely fulfills the optimality requirements set in Chapter 2 It is for further study to tune the Qo-LB algorithm to give performance closer to the optimal one under the highly varying real-life load The performance of introduced algorithm goes quite close to the optimum, but there is still a place for improvement The problem has so many input parameters that the theoretical optimum can never been reached in practice It has to be noticed that including some knowledge of the requested services to the problem increases the complexity of the problem and may lead to an algorithm that is not straightforward enough to be implemented in practice Our goal is to keep the algorithm simple enough to enable concrete implementations by keeping the required processing capacity in reasonable limits in the front-end switches The introduced algorithm does not directly offer any provable quality of service bounds From this point of view, the algorithm would be better referred to as a Class of ervice algorithm The overall idea of this paper is to describe an algorithm that provides differentiated access to different type of customers or requests, and offers better playground for real Qo mechanisms The engineering task to offer Qo with such a differentation tool is out of the scope of this paper Globecom 2, Volume 4, pp 232-2325, November 2, UA [6] W Leland, M Taqqu, W Willinger and D Wilson: On the self-similar nature of Ethernet traffic (extended version) IEEE/ACM Tran Networking, Vol 2, 994, pp -5 [7] Liedtke, V Panteleenko, T aeger, and N Islam: High-performance caching with the Lava hit-server In Proceedings of the UENIX 998 Annual Technical Conference, New Orleans, LA, une 998 [8] V Paxson and Floyd: Wide Area Traffic: The Failure of Poisson Modeling IEEE/ACM Transactions on Networking, Vol 3, No 3, une 995, pp 226-244 [9] B Tsybakov, N D Georganas: On elf-imilar Traffic in ATM Queues: Definitions, Overflow Probability Bound, and Cell Delay Distribution IEEE/ACM Transactions on Networking, Vol 5, No 3, une 997, pp 397-49 [] WWW-NIMBU, Nondifferentiable Interactive Multiobjective BUndle-based optimisation ystem, http://nimbusmitjyufi [] http://wwwmathworkscom/ [2] http://wwwnecsomcom/ps swtchtm References [] G Apostopoulos, D Aubespin, VPeris, P Pradhan, D aha: Design, Implementation and Performance of a Content-Based witch Proceedings of IEEE INFO- COM 2, pp 7-26 [2] T Brisco: DN upport for Load Balancing, RFC 794, April 995 [3] A Fox, D Gribble, Y Chawathe, E A Brewer, and P Gauthier: Cluster-based scalable network services In Proceedings of the ixteenth ACM ymposium on Operating ystem Principles, an Malo, France, Oct 997 pp 9 27, 997 [4] K Kaario, T Hämäläinen, Zhang: Tuning of Qo Aware Load Balancing Algorithm (Qo-LB) for Highly Loaded erver Clusters Proceedings of IEEE International Conference on Networking 2 (ICN ), uly -3, 2, Colmar, France [5] K Kaario, T Hämäläinen and M Wikström: Method for Improving Cache Hit-Rates in Qo-Aware Load Balancing Algorithm (Qo-LB) Proceedings of IEEE