PolyServe High-Availability Server Clustering for E-Business 918 Parker Street Berkeley, California 94710 (510) 665-2929 wwwpolyservecom Number 990903 WHITE PAPER DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING An increasing number of organizations are clustering web and other application servers together in order to achieve highly available systems that can manage an increasing base of client traffic While the web has primarily driven this trend, other applications are also being deployed in a clustered environment One area of clustering that is important is in the realm of balancing or sharing the load of client traffic amongst multiple servers Recent releases of DNS/BIND have included a capability to allow load sharing in what is known as a round robin technique Although this is widely available, easy to use, and low cost, it suffers from a few limitations; one of which is its inability to handle server failures This white paper discusses how to setup DNS round robin for load sharing in a multiple server environment Finally, it shows how to use the Polyserve Understudy product together with DNS round robin to eliminate the server failure problem The purpose of this paper is to provide instruction on how to setup DNS round robin in a highly available server cluster This paper contains the following sections: Configuring DNS Round Robin Limitations of DNS Round Robin Configuring DNS and Understudy for High Availability Load Sharing Configuring DNS Round Robin Features were placed in BIND 49 that allow simple load sharing to be configured among multiple servers An excellent overview exists on page 259 of the most recent O Reilly book: DNS and BIND (O Reilly & Associates, 1998, Third Edition) This book discusses BIND 483 (which used a shuffle address record scheme and required a patch) and BIND 49, which has embedded load-sharing capabilities We recommend using BIND 49 and later versions (especially BIND 8) and will discuss how to set this up below BIND 49 and more recent versions now allow A records (address records) to be duplicated for a specific host, with different IP addresses The name server then alternatively rotates addresses for any one name that has multiple A records This is known as DNS round robin As an example, let us assume that we at Polyserve have three (3) web servers: their real names and IP addresses are: wwwpolyserve1com 150111 wwwpolyserve2com 150112 wwwpolyserve3com 150113 If we wanted to set up our servers so that DNS requests by clients (in this case for web server access) are round robin rotated, we can do so by placing multiple A records in the authoritative name server files For our example above, we want all clients to access our site by using wwwpolyservecom, but for these requests to be shared between our three servers using DNS round robin To do so, we need to place the following A records in the name server: wwwpolyservecom 60 IN A 150111 wwwpolyservecom 60 IN A 150112 wwwpolyservecom 60 IN A 150113
Note a few very important items here The first is the after the name wwwpolyservecom on each A record -- this is mandatory or the name server will append the domain origin to the name Also, the other issue is the TTL (time to live value) The time to live field tells the servers to remove these entries from the name cache after this many seconds This is the 60 shown on the A record in the example above The value of 60 seconds insures that this value is not cached for a great length of time on intermediate name servers that don t support round robin Again for a much more thorough discussion of these and other related issues, see the O Reilly DNS and BIND book mentioned above Most of the high profile systems, such as Solaris, NT, and Linux (as well as others) support BIND 49 and later versions The best route to be sure your specific name server supports the DNS round robin feature is to contact your vendors technical support line or access their web page and find out from them directly As an example, Microsoft NT requires SP4 for round robin support Most of the Linux vendors, as well as Solaris, come with these capabilities already included DNS round robin supports pools of servers for any applications, not just web servers Pools of web, email, ftp, database, and other servers can all be setup to load-share using DNS Limitations of DNS Round Robin DNS Round Robin has a number of advantages and a few limitations The main advantage is its simplicity and low cost A simple addition to the name-server configuration file allows a pool of servers to be clustered and appear to act as a single host to the clients, when in reality requests are being alternated between all the hosts in the pool It is standard software in most of these systems (or can be obtained at no or low cost) For this reason it is very effective for small to medium size business or organizations It is extremely popular among ISPs, e-commerce sites, universities, and other cost sensitive sites Load Balancing vs Load Sharing There are limitations with this architecture and they should be noted The first is that DNS round robin is actually not a load balancing mechanism; instead it is a load sharing mechanism Load balancing has become popular at large enterprise web sites that need to support many hosts at potentially different geographic locations These hardware and software solutions measure the load on the systems and gauge where to send client requests in order to spread the load among the servers There are a variety of algorithms to do this, including using: CPU load Response Time Least Connection Assigned Weight Service Level Agreements Custom Rules Simple Round Robin (but not using DNS) While these products are all good at what they do, they tend to be costly to employ, and therefore are effective for larger organizations Most of them include security, integrated management, application monitoring & failover, and sophisticated APIs for defining homegrown service monitors DNS round robin does not gauge server load in any way; instead it simply alternates client requests among the pool of servers defined in the name server files This basically shares the load among multiple hosts One or more of the hosts in the pool will tend to get more activity than the other servers DNS round robin should be quite effective up to about 10 servers per virtual cluster (a virtual cluster being defined as a pool of servers acting as a single server for client requests) These hosts would all be in the same physical location, most likely on a number of different high-speed switch ports Our research has shown that load sharing is effective for small to medium size organizations Recognize that at some point you may need to consider a larger product that does load balancing and provides the scalability that DNS round robin will not allow This is especially true when multiple site support is required - 2 -
DNS Round Robin and Server Failures How does DNS operate if one of the servers crashes or is down for maintenance? Simply enough, requests from clients will still go to this IP address when it is its turn in the round robin pool Existing client sessions will still be sent to this address The result is that all of these requests will go to hosts that will not operate correctly This is a serious limitation of the DNS round robin feature Many shops simply cannot allow a potentially large number of client requests to go unanswered This is obviously not good for business In the next section, we will show how this problem can be rectified without having to purchase sophisticated traffic management solutions Configuring DNS and Understudy for High Availability Load Sharing Polyserve Understudy is a high availability clustering software product that currently runs on Linux, Solaris, and NT Understudy runs on each server in the cluster and performs automatic failover and service monitoring and can be configured with DNS round robin to eliminate the server failure or maintenance problem discussed in the limitation section above By using Understudy with DNS round robin, we will demonstrate how virtual pools of servers can be configured to guarantee that all client requests are being sent to active, operational servers, even when a portion of the server pool is down Understudy High Availability Server Configuration Before explaining how to configure Understudy and DNS round robin, let s first understand how Understudy works using the three server example we discussed above Assume that we at Polyserve have three web servers: wwwpolyserve1com, wwwpolyserve2com, and wwwpolyserve3com Using Understudy, we define a virtual host wwwpolyservecom which is the client access name for all three Polyserve servers Figure 1 shows how Understudy manages this virtual host and server pool wwwpolyserve1com (Primary) ACTIVE wwwpolyserve2com (Backup1) Web Inactive Clients access wwwpolyservecom Figure 1 wwwpolyserve3com (Backup2) Inactive Understudy manages the virtual server wwwpolyservecom by allowing one host in the real server pool to be marked the primary, while the others are the backup hosts in the pool As can be seen from figure 1, wwwpolyserve1com is the primary server, while wwwpolyserve2com is the first backup server and wwwpolyserve3com is the final backup server When a client accesses wwwpolyservecom the requests are all sent to the primary host wwwpolyserve1com This is - 3 -
because the virtual host IP address for wwwpolyservecom is mapped to the MAC address for wwwpolyserve1com Note in figure 1 that the two backup servers are currently inactive in the cluster This doesn t mean these servers are performing other functions, it just means that in the virtual cluster wwwpolyservecom, they are not handling client requests Now let s see what happens when one of the servers fails Figure 2 shows how Understudy manages the cluster when wwwpolyserve1com goes down (or is taken out of the cluster for maintenance reasons) Understudy runs on each server in the cluster and periodically communicates with each to validate that all servers are up and operational Understudy also can be configured to monitor specific services such as HTTP, SMTP, FTP, and various TCP/IP ports In the case of figure 2, Understudy has detected that wwwpolyserve1com is down To perform the failover, Understudy sends a gratuitous ARP that tells the router that wwwpolyserve2com is now handling all traffic for the virtual host wwwpolyservecom Understudy marks the primary as down and the first backup server as active The second backup server (wwwpolyserve3com) is still inactive in the cluster wwwpolyserve1com (Primary) DOWN wwwpolyserve2com (Backup1) Web ACTIVE Clients access wwwpolyservecom Figure 2 wwwpolyserve3com (Backup2) Inactive When wwwpolyserve1com is back up, it sends a gratuitous ARP that tells the router that it is now handling requests for wwwpolyservecom In this way, automatic failover is done in a way that is not visible from the client who is simply accessing wwwpolyservecom and doesn t know which of the three servers is actually handling their requests Understudy can support 2 or more hosts per cluster (up to a maximum of 10 servers) Configuring Understudy to support DNS Round Robin Understudy can also be configured to support highly available DNS load sharing Again, our three-server Polyserve web site example illustrates the point Figure 3 shows how DNS and Understudy can be configured to guarantee that all round robin load sharing server requests are sent only to servers that are active First, note that instead of defining a single wwwpolyservecom virtual host, we now define 3 virtual hosts, one for each server in our DNS round robin pool - 4 -
Virtual Host wwwvirtualpoly1com 160111 Round Robin Setup: wwwpolyservecom 60 IN A 160111 wwwpolyservecom 60 IN A 160112 wwwpolyservecom 60 IN A 160113 Virtual Host wwwvirtualpoly2com 160112 DNS Server web wwwpolyservecom Virtual Host wwwvirtualpoly3com 160113 Clients Figure 3 Let s look at the first virtual host: wwwvirtualpoly1com We configure wwwpolyserve1com as the primary server, and wwwpolyserve2com and wwwpolyserve3com are the backup servers (in that order) The IP address of the virtual host wwwvirtualpoly1com is 160111 All requests to wwwvirtualpoly1com (160111) go to wwwpolyserve1com since this is the primary server for this virtual host If wwwpolyserve1com were to fail then wwwpolyserve2com would be the next backup in this virtual host pool The configuration for this virtual cluster is: Virtual Host wwwvirtualpoly1com (IP address 160111) Primary is wwwpolyserve1com Backup #1 is wwwpolyserve2com Backup #2 is wwwpolyserve3com In the same manner, two more virtual hosts are defined In the second cluster the primary is wwwpolyserve2com; while the third cluster has as its primary server wwwpolyserve3 They have the following configurations: Virtual Host wwwvirtualpoly2com (IP address 160112) Primary is wwwpolyserve2com Backup #1 is wwwpolyserve3com Backup #2 is wwwpolyserve1com Virtual Host wwwvirtualpoly3com (IP address 160113) Primary is wwwpolyserve3com Backup #1 is wwwpolyserve1com Backup #2 is wwwpolyserve2com - 5 -
Finally, the DNS server is configured for round robin with the following A records added to the correct files: wwwpolyservecom 60 IN A 160111 wwwpolyservecom 60 IN A 160112 wwwpolyservecom 60 IN A 160113 At this point you might be asking, Why have we defined three virtual hosts with each of the three Polyserve web sites in each virtual host? As we will now show you, the key is the ordering of the primary host in each virtual host cluster Let s run through an example to see how this setup operates Figure 3 shows three clients that will all attempt to access wwwpolyservecom Client #1 makes the first attempt A DNS lookup is done and because 160111 is the first A record in the DNS file, it is returned as the target address This is the virtual host wwwvirtualpoly1com and because Understudy is configured for the primary server to be wwwpolyserve1com in this virtual host pool, it receives the client request When client #2 makes the request, the DNS lookup returns the next A record in the DNS server (160112), which is wwwvirtualpoly2com and is handled by wwwpolyserve2com (the primary server for the virtual host wwwvirtualpoly2com) Finally, when client #3 accesses wwwpolyservecom, its request is ultimately managed by wwwpolyserve3com So why did we need 3 virtual host clusters for this to work? As we will show, it is when a server fails or needs to be taken offline that this setup is most effective If no servers fail, then the cluster operates just as you would expect Each server handles 1/3 of the requests via the DNS round robin entry in the name server file Figure 4 shows what happens when wwwpolyserve1com (the real server) goes down If Understudy was not used, the DNS round robin setup would forward every third request to this server and each of these requests would not be successfully handled (they would go into the proverbial void ) Virtual Host wwwvirtualpoly1com 160111 Round Robin Setup: wwwpolyservecom 60 IN A 160111 wwwpolyservecom 60 IN A 160112 wwwpolyservecom 60 IN A 160113 Virtual Host wwwvirtualpoly2com 160112 DNS Server web wwwpolyservecom Virtual Host wwwvirtualpoly3com 160113 Clients Figure 4-6 -
But with Understudy configured as shown in figure 4, as long as a single server is up and operational, all client requests will go to active servers Let s see how this works Assume wwwpolyserve1com goes down This is shown as the red host in each virtual host pool It could have crashed, HTTP might no longer be operating correctly (maybe it went down or crashed), or the server might have been removed for maintenance reasons Within a few seconds (the default is 10 seconds), Understudy realizes the virtual host cluster wwwvirtualpoly1com has lost its primary server It then makes wwwpolyserve2com the active server for this virtual host Each new client that is resolved by DNS to 160111 (one out of every 3 requests will go to this address) goes to the virtual host wwwvirtualpoly1com Since this virtual host is now pointing to wwwpolyserve2com, this host now handles all requests for wwwvitualpoly1com Since the other virtual hosts (virtualpoly2 and virtualpoly3) primary servers are up, these servers continue to handle each request that comes its way The fact that wwwpolyserve1com went down has no effect on the requests to wwwvirtualpoly2com or wwwvirtualpoly3com The fact is then that each request that is sent by DNS to the 160111 host (virtualpoly1com) now actually go to wwwpolyserve2com instead of wwwpolyserve1com (which is down) Two out of every three round robin client requests go to wwwpolyserve2com, while the third goes to wwwpolyserve3com Each client request through DNS goes to an active, operational machine and is not transferred to the void Obviously wwwpolyserve2com is handling more requests than wwwpolyserve3com, but this is certainly much better than each third request not being handled correctly The more servers in the pool, the less load each will have to handle in case of a server failure And what about existing sessions that do not need to go through DNS again? Will they continually be sent to the failed server? In fact they will also be re-routed to wwwpolyserve2com, since the gratuitous ARP message forces all requests to the virtual host 160111 to the backup server Therefore, even existing sessions will be routed to working servers One issue that often comes up is that if this backup server does not have the same data as the original server, it is possible that the client request will not have access to the same data Fortunately, Understudy supports data replication and synchronization, so the servers can automatically have their data replicated and synchronized for complete cluster control (if this is required) When wwwpolyservecom comes back online, all requests for 160111 will now be routed to wwwpolyserve1com (since it is the primary for this virtual host and will be used whenever it is up) What happens if both wwwpolyserve1com and wwwpolyserve2com go down? wwwpolyserve3com will handle all client requests for wwwpolyservecom Together, Understudy and DNS round robin are a powerful, low cost alternative to purchasing expensive, complex load balancing and clustering solutions A multitude of applications can be supported in this configuration, including: HTTP, FTP, SMTP, and various TCP/IP applications Understudy has a range of other features and as discussed earlier, supports Linux, Solaris, and NT An evaluation copy of the product can be requested online by going to PolyServe s web page at wwwpolyservecom - 7 -