Multicast-based Distributed LVS (MD-LVS) for improving scalability and availability Haesun Shin, Sook-Heon Lee, and Myong-Soon Park Internet Computing Lab. Department of Computer Science and Engineering, Korea University, Korea E-mail:{shinsun, tonaido, myongsp}@ilab.korea.ac.kr Abstract However, the availability and scalability of cluster system is low, since this kind of solution creates a single-point-of-failure and the total throughput of cluster is limited by the performance of Load Balancer. In distributed IP cluster, every host in a web server cluster acts both as a Load Balancer and as a real server. However, this approach may increase the CPU workload of all the hosts to process the packetfiltering and load balancing. In this paper, we propose the Multicast-based Distributed LVS (MD-LVS), which combines centralized IP
cluster and distributed IP cluster. There are multiple Load Balancers and each Load Balancer has an individual and independent cluster group, which consists of several real servers. This mechanism efficiently can improve scalability and availability of cluster system, since it is easy to expand the cluster system and may avoid the entire system failure. 1. Introduction With the explosive growth of the World Wide Web, some popular web sites are getting thousands of hits per second. As a result, clients (browsers) experience slow response times and sometimes may not be able to access some web sites at all. Clustering with a single-system image view is the most commonly used approach to increase throughput of a Web site. With a server cluster, multiple servers behave as a single host, from client s perspective[1,2,3,4,5,6,7]. There are two types of clustering architecture: centralized IP cluster and distributed IP cluster. Centralized IP cluster consists of one Load Balancer and several real servers. Load Balancer distributes incoming requests of clients to an appropriate real server based on load characteristics. The centralized IP cluster includes LVS (Linux Virtual Server) as S/W load balancing method[1,2,3,11], and MagicRouter, LocalDirector and TCP Router as H/W load balancing method[9,10]. Real server makes the HTTP services and FTP services. However since this kind of solution creates a single-point-of-failure in the system, the availability is low. In
addition, the scalability is low, since the total throughput of cluster is limited by the performance of Load Balancer. The Load Balancer bottleneck becomes more critical compared to the network bottleneck and it limits the scalability of such servers in processing large number of simultaneous client requests. In distributed IP cluster, a collection of hosts works together to serve Web requests. Every host in a web server cluster acts both as a Load Balancer and as a real server. This approach includes DPR (Distributed Packet Rewriting) and ONE-IP. Distributed designs have the potential for scalability and cost-effectiveness. However, this approach may increase the CPU workload of all the hosts to process packet-filtering and load balancing. Since all the hosts individually have to process the load balancing, the service capability of each node may be decreased also. In addition, when one of hosts fails, the others have to reconfigure the system state information and load distribution information[4,5,6]. In this paper, we propose the Multicast-based Distributed LVS (MD-LVS). In structural aspect, we combined centralized IP cluster system and distributed IP cluster system. There are multiple Load Balancers and each Load Balancer has an individual and independent cluster group, which consists of several real servers. This mechanism can improve scalability and availability of cluster system, since it is easy to expand the cluster system and may avoid the entire system failure. So this can open a new cluster paradigm. The rest of the paper is organized as follows: Section 2 gives a related work of IP clustering. Section 3 presents our main idea, Multicast-based Distributed LVS and section 4 describes prototype implementation. Finally, section 5 concludes this paper.
2. Related Work 2.1 Centralized IP Cluster approach Centralized solution requires one Load Balancer, which distributes incoming packets to one of real servers, and several Real Servers, which supports HTTP and FTP service and so on. This approach is the most general way in cluster system and easy to implement of load balancing algorithm. However, it may create a single - point-of-failure and single -point-of-overload in the system, and requires the special purpose hardware as Load Balancer[1,2,3,9,10,11]. (1) S/W load balancing LVS[1,2,3,11] is built on a cluster of loosely coupled independent servers. This is developed by open project. LVS is now implemented in two ways. One is virtual server via NAT (Network Address Translation); the other is virtual server via LVS-DR (Direct Routing) and LVS-Tunneling. The virtual server code is now developed based on Linux IP Masquerading code in the Linux kernel 2.0 and 2.2, and some of Steven Clarke s port forwarding code is reused. It supports both TCP and UDP services, such as HTTP, Proxy, DNS and so on. In LVS-NAT, all the packets from and to clients pass through the Load Balancer. So the CPU load of Load Balancer may be concentrated seriously. In other hand, in LVS-DR and LVS-Tunneling, only incoming packets from clients pass through the Load Balancer. As a result, the scalability may be increased
somewhat. (2) H/W load balancing The responsibility of distributing requests to individual hosts in a cluster, several research groups hence suggested the use of a local router to perform this function. For example, there are MagicRouter of Berkely, TCP Router of IBM, and LocalDirector of Cisco, which are a packet-filter-based approach to distributing network packets in a cluster, and act as a switchboard that distributes request for Web service to the individual real servers in the cluster[9,10]. MagicRouter and LocalDirector use the Network Address Translation approach and is similar to LVS- NAT[9,10]. However, the MagicRouter doesn t survive to be a useful system for other users, the LocalDirector is too expensive, and they only support part of TCP protocol. TCP Router uses the modified Network Address Translation approach to build scalable web server and is similar to LVS-DR. TCP Router
changes the destination address of the request packets and forwards the chosen server, that server is modified to put the TCP router address instead of its own address as the source in the reply packets. The advantage of the modified approach is that the TCP router avoids rewriting of the reply packets. The disadvantage is that it requires modification of the kernel code of every server in the cluster. These approaches require using special-purpose hardware as Load Balancer. So cost is very expensive and the scalability is very low since the port number is fixed. 2.2 Distributed IP cluster approach Adding a second server to the site requires no special hardware, introduces no single-point-of-failure, and utilized the added capacity to scale both the connection routing and connection service capacities equally. In structural figure, DPR combines RR-DNS and LVS-DR (or LVS-Tunneling). That is, entry point of cluster is distributed by RR-DNS method and load balancing of cluster itself is achieved as LVS-DR or LVS -tunneling method by individual node[4,5,6,7,8]. (1) DPR In DPR[5,6], every host in a Web server cluster acts both as a real server and as a connection router. Thus, unlike existing solutions that rely on a single, centralized connection router, DPR enables both the service and
the routing responsibilities to be shared by all hosts in the cluster. Distributing the connection routing functionality allows for true scalability, since adding a new host to the cluster automatically adds enough capacity to boost both Web service and connection routing capacities. To enable a stateful routing of requests using DPR, each machine keeps an updated list of all other machines within the cluster, with information such as their IP addresses and current load. Hosts intermittently broadcast their load to the other machines (using multicast UDP packets). This information is used by a server to determine whether an incoming request should be re -routed or whether it should be served locally. Also, each machine keeps routing tables with information about redirected connections. Rewriting packets from servers to client is needed and the table of connections currently being rewritten should be maintained by each host. This may increase the service delay and the process overhead. Although DPR approach may perform dynamic and accurate load balancing, it increases network traffic of server hosts and CPU load to process lo ad information. In addition, naturally the disadvantage of RR-DNS is inherited. (2) ONE-IP ONE-IP[4] is another IP-level scheduling approach and is based on packet broadcasting and local filtering. A broadcast mechanism is used to send client packets to every server. Each machine implements a local filter so that every packets are processed by exactly one server. The architecture for this scheme is shown in Fig. 4.
When a request packet from a client arrives at the router, the router puts the packet on the server network as an Ethernet broadcast packet, and so packet is picked up by all the server machines[4,8]. A small filtering routine is added to each server s device driver to ensure that one machine accepts the packet. Each machine is assigned a unique ID number, and the filtering routine computes a hash value of the client IP address and compares it with the ID. If they do not match, the packet is discarded; otherwise, the server accepts the packet, and processes it as if it has received the packet through a normal IP routing mechanism. Since all the processing is based on the ghost IP, the reply packets are directly sent back to the client. Given n server machines, s0, s1,, s(n -1), and a packet from client IP address CA, a simple dispatching function is to compute k=ca mod n and select server k to process the packet. Since the scheme does not maintain any mapping table, the dispatcher is basically stateless. In order to make the determination of whether to forward a packet, stateless approach requires any information what can be found in the headers of each packet and does not rely on feedback from other machines regarding current load. The disadvantage is it can be applied to all operating systems because some operating systems will shutdown the network interface when detecting IP address collision, the local filtering also requires the modification of the kernel code of every server. In the broadcast-based dispatching scheme, broadcasting each incoming packet in the server LAN does not increase network traffic. However, a hash value needs to be computed for every ghost IP packet, which
increases the CPU load of each server. Compared to the communication delay, this computation overhead is negligible. Although this type of static load balancing allows low-cost implementations with fast dispatching, it may not perform dynamic and accurate load balancing. 3. Multicast-based Distributed LVS We propose the multicast-based distributed LVS, which can improve scalability and availability. This mechanism combines the centralized IP cluster and the distributed IP cluster. There are multiple Load Balancers, and incoming packets from router are multicast to Load Balancers. Service Load Balancer is selected by Packet Accept Load Balancing Algorithm. This figure follows the distributed IP cluster
mechanism, ONE-IP. After Service Load Balancer is selected, the rest of process follows the LVS mechanism. So, This figure follows the centralized IP cluster mechanism, LVS. (1) Single En try Point Image by the multicast-based multiple Load Balancers If router receives packets from client and transfers to the Load Balancers by multicast method, the Load Balancers should determine whether it accept the packet or not. There are several Packet Accept Load Balancing Algorithm. Once a Service Load Balancer is selected, future incoming packets for the same request must be directed to the same Load Balancer. The cluster system acts as follows: ( ) Client sends request to Apache Web server set by Virtual IP.
( ) Router checks ARP table to found the Ethernet address of Load Balancer and multicasts to Ethernet Multicast corresponding Virtual IP address. ( ) Load Balancers accepts the multicast packets and decides to accept packet or not using Packet Accept load Balancing Algorithms. ( ) If Load Balancer decides to do packet service, Load Balancer forwards packets to the Real Server. If not so, packet is discarded. In our system, the whole nodes include Load Balancers and Real Servers share the Virtual IP, only one published IP. However, only the Load Balancers except the Real Servers share the Ethernet Multicast Address. That is, the NI C (Network Interface Card) address of the Load Balancers is set Virtual IP and Ethernet Multicast Address. So, if the router multicast the incoming packets, only Load Balancers are able to accept the IP packet from client. This configuration can be set by ifconfig command.
In addition, the ARP table of router should be fixed in Virtual IP and Ethernet multicast address statically by arp command. So, when the packet from client reaches the router, router may multicast the Virtual IP packet using the Ethernet Multicast Address. In this case, the above scheme requires a permanent ARP entry at the router. However, it would cause the router to overwrite the entry in its ARP cache, since there are the different ARP response messages from multiple Load Balancers. To overcome this problem, we deactivate the ARP response function with /proc command. In ONE-IP, another ghostip1 is used to solve this problem, but it is more difficult than ours. The main advantage of this mechanism is the availability and scalability improvement. Since there are multiple Load Balancers, though one of Load Balancers fails, the others can continuously service. So, the availability of cluster system may be increased very much. In addition, since the whole capacity of service is increased, the scalability may be increased. Now, we don t consider the modification of LVS. So the sub-cluster, which is controlled by each Load Balancer, is independent each other. This causes that some specific part depending on Packet Accept Load Balancing Algorithm doesn t serviced at all. However, it is more efficient than whole service stop. (2) Load balancing Algorithm among Load Balancers In our system, multiple Load Balancers behave as a single Load Balancer. If the job is not distributed to multiple Load Balancers fairly, some of Load Balancer may have overload and response time may be delayed.
So effective load balancing is an important issue of cluster system. There are two approaches depending on static or dynamic mechanism: Stateless and stateful load balancing policy. In stateless load balancing policy, load balancing is performed using a stateless routing function, e.g., a function that computes a hash value based on the source and destination IP address of the original packet. This mechanism is simple and fast, since it has not to maintain any additional information for the load balancing. However the disadvantage of this mechanism is that the load is not distributed fairly. On the other hand, the stateful load balancing policy may require more information than what is contained in the packets, for example, knowledge of load on other hosts. In this case, each rewriting host must maintain a routing table with an entry for each connection that is currently being handled by that host. As a result, this method can supports equivalent load distribution, but have a processing overhead and difficult to implement. To reduce CPU overhead of Load Balancer, we selected the stateless load balancing policy at first. Now, we propose three Packet Accept Load Balancing Algorithms. First method is to compute service Load Balancer = (the last number (1~255) of Client IP address) Mod (the total number of Load Balancers). Second method is efficiently to allocate the classes of IP addresses or a specific scope of IP address to the each Load Balancer. However, in advance recommending, the result shows this method is not suitable. Final method is the Round Robin service of the incoming packet. However, though incoming is fair, the CPU load of each Load Balancers may be different each other since the service time is not same.
To evaluate the efficiency of our proposal, at first we have analyzed a day of HTTP logs form a busy WWW server, EPA -HTTP and the caching log file of proxy server from A Distributed Testbed for National Information Provisioning project [12,13]. In EPA-HTTP log file, the logs were collected on August 30, 1995, a total of 24 hours. There were 47,748 total requests. However, the type of client address is two, IP address and Domain Name Address. To computation, we have extracted only the IP address type from log file. As a result, the total number of unique client IP was 567. We didn t consider the counter. In the caching log file of proxy server, site statistics files are from the five U.S. Super computer centers, and one at FIX-West. We have collected Top 20 client IP address from each site and the total number of unique client IP address was 30. Figure [5] shows the distribution of the client IP address. The load balancing method by the number of Load Balancers is reasonably fair by 56% and 44%, and 56.7% and 43.3%. However, the result shows that
when the number of client IP is low, the fairness is decreased. In addition, the allocation by the classes of IP addresses is not suitable obviously. 4. Prototype Implementations (1) System Architecture We have implemented our system on a cluster of Pentium 550 with Linux kernel 2.2.16 connected via Fast Ethernet. ( ) Router To multicast incoming packet, ARP table of router should be set with Virtual IP and Ethernet Multicast Address. We can support it by arp command. To fix the arp table entry statically, we manually created an ARP mapping entry with arp s Virtual IP Ethernet Multicast Address. ( ) Virtual IP and Ethernet Multicast Address setting in whole hosts Ifconfig is used to configure the kernel-resident network interfaces. It is used to assign two IP addresses to the interface too. However, to deactivate ARP response function, we used the #echo 1 > /proc/sys/net/ipv4/conf/all/hidden command. To accept the multicasting packet, device driver should be able to perceive corresponding Ethernet
multicast address. Multicast join program can allow Ethernet Network card to accept the Multicast packet. ( ) Apache Web Sever If a host has several IP addresses, application should choose one of several IPs. In apache web server, this value is reported in configuration file. (2) Load Balancing in kernel layer Our Packet Accept Load Balancing Algorithm has to be implemented in kernel layer. However, several firewall commands support these mechanisms. Anyway, this should be proved by installing both LVS and firewall together, and seeing no confliction. 5. Conclusion and Future Work We evaluated existing technologies and proposed distributed LVS for providing multiple Load Balancers. A prototype on Linux 2.2.16 kernel has been implemented. It shows the advantages of high scalability and availability by multiple Load Balancers. When we consider that the most important issue of cluster web server is scalability and availability, our proposal has very good advantages.. However, Packet Accept Load Balancing Algorithm is fast, but not fair. So this is our future work. In
addition, if one of Load Balancer is fail, clients can t access the sub-cluster of that Load Balancer anymore. Any efficient using method is needed also. References 1. Wensong Zhang, Shiyao jin and Qanyuan Wu, "Creating Linux Virtual Servers" LinuxExpo 1999 Conference 2. Wensong Zhang, Shiyao jin and Qanyuan Wu, "Linux Virtual Server for Scalable Network Services" Ottawa Linux Symposium 2000, July 19th~20nd, 2000 3. Wensong Zhang, Shiyao jin and Qanyuan Wu, "Linux Virtual Servers : Server Clustering for Scalable Network Services" The 2th Linux Revolution 2000 4. Om P. Damani and P. Emerald Chung and Yennun Huang and Chandra Kintala and Yi-Min Wang, ONE-IP: Techniques for hosting a service on a cluster of machines", Computer networks and ISDN Systems, volume 29, Sep, 1997 5. Azer Bestavros and Mark Crovella and Jun Liu and David Martin, "Distributed Packet Rewriting and its Application to Scalable Server Architectures", in proceeding of ICNP 98: The 6th IEEE International Conference on Network Protocols, (Austin, TX), October 1998
6. Luis Aversa and Azer Bestavros, Load Balancing a Cluster of Web Servers using Distributed Packet Rewriting, Technical Report, CS Department, Boston University, Number 1999-001, January 6 1999 7. Daniel Anderson, Tao Yang, and Oscar H. Ibarra, Towards a Scalable Distributed WWW Server on workstation clusters, Proc. Of 10th IEEE Intl. Symp. Of Parallel Processing (IPPS 96), pp. 850-856, April 1996 8. J. Mogul, R. Rashid, and M. Accetta, The Packet Filter: An Efficient Mechanism for user-level Network Code. In Proceedings of SOSP 87: The 11th ACM Symposium on operating Systems Principles, 1987 9. Eric Anderson, Dave Patterson, and Eric Brewer, The Magicroter: an Application of Fast packet Interposing (May 1996). http://www.cs.berkeley.edu/~eanders/magicrouter/ 10. Cisco System, Cisco Local Director (1998), http://www.cisco.com/warp/public/751/lodir/index.html/ 11. Wensong Zhang, Linux Virtual Server Project, http://linuxvirtualserver.org 12. A Distributed Testbed for Natinoal Information Provisioning, web site, http://www.ircache.net/cache/ 13. EPA -HTTP, web site, http://ita.ee.lbl.gov/html/contrib/epa -HTTP.html