HIGH AVAILABILITY STRATEGY - GLOBAL TRAFFIC MANAGEMENT PROTOTYPE REPORT Version 1-00 Document Control Number 2460-00004 11/04/2008 Consortium for Ocean Leadership 1201 New York Ave NW, 4 th Floor, Washington DC 20005 www.oceanleadership.org in Cooperation with University of California, San Diego University of Washington Woods Hole Oceanographic Institution Oregon State University Scripps Institution of Oceanography
Document Control Sheet Version Date Description Originator 0-13 Aug 26, 2008 Final version Brian Dunne 1-00 Nov 11, 2008 Fitted to template M. Meisinger Ver 1-00 2460-00004 i
Table of Contents Document Control Sheet...i Table of Contents...ii 1 Overview... 3 2 High Level Network Components and functionality... 5 3 Network Resiliency... 13 4 Initial OOI Network Tests... 15 Ver 1-00 2460-00004 ii
1 Overview The main goal of the OOI network design is to provide flexible and dynamic local or cloud based content to OOI participants and other potential customers of the OOI network. The second goal is to provide content which responds quickly, predictably, and reliably. There are 4 major components of the OOI network load balancing design. The first component is called Global Site Selectors. The Global Site Selectors (GSS) act as authoritative DNS servers which dynamically issue responses to DNS A record requests. They issue a response based upon feedback from the second component below. The second component is the load balancers called Application Control Engines (ACE.) The application control engines provide both Load Balancing (layer 4) and Application Content ing (Layer 7) with Application Acceleration (Layer 7.) The third component is the Enterprise class Layer 2 and 3 switches. The Cisco s are enterprise class Layer 2 and 3 capable switches with 24 or 48 quantity 1 Gbps copper ports and either 2 x 1Gbps fiber uplinks or up to 2 x 10Gbps fiber uplinks. The fourth component is the resources containing desired content. These servers can be hosted locally or in a provider s co-location facility. Alternatively they can be provided by cloud computing service providers such as Amazon. Regardless of location, these resource servers will offer identical content. All 4 components work in unison to provide the quickest and most predictable response. At a high level, the Global Site Selectors respond to client DNS resolution requests with a dynamic and optimized response which leverages one of many load balancing devices such as an Application Control Engine. The Application Control Engine then selects resources based on load at Layer 4. Also at Layer 7, multiple mirrored resources can individually supply separate content that comprises a single request while application acceleration on the can limit the amount of requests that are made to the actual physical source. The Application Control Engine also masks the true content source via Network Address Translation (NAT). The real host address could be an RFC 1918 unroutable address space such as 10.0.0.0 10.255.255.255, 172.16.0.0 172.31.255.255, or 192.168.0.0 192.168.255.255. Alternatively, the real server resources could also be using another routable address which is masked by the Application Control Engine. Note that the application control engine is one of many potential load balancers which could be used. Other vendors such as BigIP also offer similar products. However, for the OOI network, we have elected to use the device (the Application Control Engine or ACE) which can best be leveraged by the Global Site Selectors. When cloud providers are used, this choice may not be possible. If other vendors load balancers are used, we simply need to ensure that load via SNMP can be determined so that the Global Site Selectors can make a proper choice. Note that the Global Site Selectors can also determine the client load directly without a load balancer. This can become useful in situations where providers don t have or don t use balancers. A test will be performed (discussed later in this paper) comparing performance of external or cloud based resources with that of local hosts. Also, a comparison will be made between using the Global Site Selector with the load balancer vs. polling the hosts directly, both locally and with the cloud based resources. Ver 1-00 2460-00004 3
During the remainder of this paper, the Global Site Selector will be used interchangeably with the acronym GSS, Application Control Engine will be used interchangeably with Load Balancer or the acronym ACE, Server Resources will be used interchangeably with hosts, and Virtual IP will be used interchangeably with VIP. Ver 1-00 2460-00004 4
2 High Level Network Components and functionality a. Global Site Selectors i. These devices will take the place of the DNS authoritative servers for the OOI domain. The Global Site Selectors dynamically issue DNS host record responses based on a plethora of algorithms (described below as balance methods ) designed to optimize server loads and traffic levels. ii. There are DNS rules which are based upon source address (from where) asking for which domains (to where) using an answer group (which answers should be considered) using what balancing method (which one is the best.) 1. There are up to 3 balance methods possible for each answer group. If the first balance method doesn t meet conditions, the second is chosen etc. 2. Please see diagram 1 for more details. iii. The 10 balance methods are: 1. Static: a. A direct mapping of the client s DNS server to a destination. 2. Ordered List 3. Source address and domain hash a. A hash value of the source IP of the client DNS server and the client s domain is used to determine the destination. 4. Global sticky domain database: a. A sticky database where after the Global Site Selector chooses a specific data center, it chooses the same data center again and again for that client. This information can be shared between multiple Global Site Selectors so that items such as online shopping-carts can reference the same server even if a 2 nd Global Site Selector is referenced on a return visit. 5. Round Robin a. This sends requests in order to separate data centers. 6. A weighted round robin a. Sends requests to favored data centers based upon a set weighting. 7. Least Loaded: a. Loads are determined from the site s ACE, and this data is used to determine which site to use. b. Load thresholds can be set for this algorithm. If thresholds are exceeded, then the site can be considered unavailable. 8. Director Response Protocol (DRP) a. The GSS uses DRP to ensure a data center load balancer (called the DRP agent) probes the client DNS to determine which data center is closest (based on Round Trip times.) i. ICMP or TCP probes are used to determine the closest to the site. 9. DNS Race Ver 1-00 2460-00004 5
a. First the delay between the Cisco GSS and the Data Center load balancers (called CRAs in this instance) is determined b. This is used to ensure the load balancer or router at each data center sends an A query to the client s DNS server (called DNS race ) at exactly the same time. c. The first data center to receive a response is the closest to the source. d. This probing is considered inactive (compared to the above DRP.) 10. Drop Diagram 1: Global Site Selector Logic iv. Some of the above Global Site Selector algorithms involve direct Application Control Engine polling using protocols. These protocols include: 1. UDP based KAL a. Provides Round Trip Times (RTT) between the GSS and the ACE 2. TCP based KAL a. Used when there are non-cisco load Balancers in use. b. This can also be used to determine the host load directly without a load balancer. 3. KAL-AP a. Extracts load and availability from the ACE 4. ICMP Ver 1-00 2460-00004 6
a. Simply ICMP 8/0 pings a device to indicate its status 5. HTTP a. From the GSS to the Origin DNS server. Checks for HTTP 200 ok. v. Other Global Site Selector Design Elements: 1. Dual Global Site selectors (GSS) will be used to take the place of primary and secondary DNS servers. Dual units will add robustness and redundancy. 2. While the dual GSS s will initially be tested in a single geographic location, one unit (the secondary) will likely be moved to a different location in the future for added robustness. 3. The GSS algorithm used will be based on client DNS geographic proximity to the server resources. Each region will have a set of ACEs and corresponding server resources. 4. A secondary algorithm will be set so that if the closest geographic site is down, there are other options. Lastly, a tertiary algorithm will be chosen for maximum redundancy. vi. See Diagrams 2 and 3 to see how the Global Site Selector can choose Load Balancers based upon information fed back to the Site Selectors. Ver 1-00 2460-00004 7
Diagram 2: 4 critical OOI network components: GSS, ACE,, Server Resources shown at 2 sites, one with local resources and another with cloud-based resources. All resources, regardless of location, are mirrored. In this scenario, Data from the ACE are fed to the Global Site Selector(s) in order that they may optimally route traffic based on server load. USER1 2 DNS Requester DNS Resolution 3 Data Global Site Selector (Authoritative DNS) Network Ingress Dual Routes to VIP via Routers Network Ingress Dual Routes to VIP via Routers VlanX 1 Relative Load Site A =Low VlanX SITE A SITE B Virtual Interface 1 Virtual Interface 1 Mirrored Content Mirrored Content Ver 1-00 2460-00004 8
High Availability Strategy - Global Traffic Management Prototype Report Diagram 3: All components (except for the ACE) are present at Site B. Site B is a cloud computing provider. In this case, site B is chosen, and a 3rd party load balancer is leveraged. When 3rd party load balancers are used, different protocols are used. Sit 2 DNS Resolution USER1 DNS Requester Global Site Selector (Authoritative DNS) eb is L ogi cal 1 D N C l o l y (a n s d Ss erv est to Phys er o ica lly) fu ser 1 3 Network Ingress Data Dual Routes to VIP via Routers VlanX Network Ingress Dual Routes to VIP via Routers SITE A Virtual Interface 1 SITE B Load Balancers-BIGIP Vendor X Mirrored Content Load Balancers-BIGIP Virtual Interface 1 Vendor X Mirrored Content b. Application Control Engines (ACE) i. The Application Control Engines devices provide layer 4 load balancing. ii. They also provide layer 7 (application) content switching for resources to optimize response time. iii. The ACE also provides layer 7 application acceleration (separate license) 1. Dynamic web content is served from a cache 2. Only differences between the last visit are sent. 3. HTTP 304 responses are reduced. a. HTTP 304 indicates the resource hasn t been changed since the last request. The client provides a header to give a time when it last received data. 4. HTTP 200 responses are reduced a. This is the regular response to successful HTTP requests. Ver 1-00 2460-00004 9
5. HTTP transaction response times are monitored and reported if needed. iv. The ACE also provides compression acceleration. 1. 2 Gbps of hardware accelerated compression is provided. When files are stored in a compressed format but must be provided to the client uncompressed, this acceleration becomes a factor. v. Lastly, the ACE provides SSL acceleration 1. The encryption and decryption of SSL is offloaded to the ACE. This also allows the ACE to perform packet analysis for security policy. vi. Note that at certain data centers, non-ace devices can be used by the Global Site Selector as well. As long as the load balancer uses SNMP RO polling for its load levels, then the Global Site Selector can leverage it. 1. Alternatively, any GSS Algorithm which doesn t directly require Load Balancer polling can be used with any data center load balancer or host directly. vii. Other ACE Design Elements 1. In the OOI design, the ACE (in tandem) are directly connected to the upstream. a. This provides the switch with instant feedback regarding the ACE status. If the ACE fails, the port connected to the router fails. At layer 2, this then also prevents the upstream router from ARPing (Address Resolution Protocol where the Layer 2 MAC address is determined from the IP address) for the correct MAC address. b. The other ACE then takes over, offering ARP responses such that the ingress both think that the only active VIP for the ACE is on the alternate path. c. Cisco switches i. The is an enterprise class switch with 24 or 48 port x 1 Gbps configurations. For uplink, a twin-gig module provides 2 optical SFP x 1Gbps links. These can be upgraded to up to 2 x 10Gbps X2 transceivers of LR (10Km) range. ii. For the purposes of OOI, the acts as a standard Gigabit Ethernet connection both to the Application Control Engines and to the resource servers. iii. For cases where the resource servers reside within a provider s network, then the provider s switches and load balancer (equivalent to our Application Control Engines or ACE) will be leveraged. iv. The route resiliency provided in the standard design (two paths to the mirrored resources) is both feasible and beneficial when the resource servers are locally managed and installed. v. When a provider is used, it is more difficult to ensure the provider has ensured multiple routes to multiple mirrored resources. The only way to confirm is to view the network diagrams and setup of the provider. vi. The can have both a role as a combination layer 2 and 3 device (see upstream router in diagram 1 and 2) or solely as a layer 2 device (see switch in diagram 1 and 2). In both cases the platform offers enterprise class features/robustness. Ver 1-00 2460-00004 10
d. Resource Servers i. These are servers with mirrored content. The content can be distributed intelligently and cohesively using the ACE. Each and every mirrored server is Network Address Translated to one Virtual Routable IP which is publicly accessible. The resource servers can be added or removed from Virtual IP pools dynamically. ii. The load on these servers can be accessed by the Application Control Engines. See Diagram 2 to determine how the server load can affect the site selection process. iii. Server resources will be dynamically added and removed from the Application Control Engine based on availability. This will be done through a simple but secure SSH based script. 1. One of the other major benefits of the design is that it can easily and robustly scale. As shown in Diagram 4, OOI can easily replicate the infrastructure to add more ACEs as needed. The same fault tolerance and robustness of the 2 column design can be provided with 4 columns. Ver 1-00 2460-00004 11
High Availability Strategy - Global Traffic Management Prototype Report Diagram 4: The OOI network can easily and robustly scale by adding similar columns of hardware including a new link to the core ingress, a new upstream router, a new ACE and a new local Layer 2 switch. USER1 DNS Requester Global Site Selector (Authoritative DNS) Network Ingress Redundant Routes to VIP via Routers Network Ingress Redundant Routes to VIP via Routers VlanX SITE A ACE Interconnect Virtual Interface 1 Mirrored Content Ver 1-00 Mirrored Content 2460-00004 Mirrored Content 12
3 Network Resiliency b. The network design offers resiliency and redundancy at multiple layers i. Layer 7 1. Content switching at layer 7 can leverage multiple resource servers for the content of one request. 2. Application acceleration, compression acceleration, and SSL acceleration can all provide speedier results and reduce the overall load and transactions required back to the actual content. 3. An added benefit is energy use can be minimized / optimized using these layer 7 techniques. ii. Layer 4 1. Host loads based on the number of requests per port are measured and monitored by the ACE. Loads can be optimized by the ACE. iii. Layer 3 1. Both Application Control Engines are active at the same time for the same Virtual IP address. This is possible as the ingress router (s) to the network have multiple routes, each of which defines the destination for the traffic on a flow by flow basis. 2. This is beneficial over other scenarios such as Active-Passive, where one ACE remains dormant during the majority of time for any single Virtual IP. 3. See diagram 5 below for the Layer 3 (and 2) details. Diagram 5: The OOI network design heavily leverages the network (at Layer 3) to provide both load balancers with traffic simultaneously. Layer 2 in the design also provides simplicity and fault tolerance. Ver 1-00 2460-00004 13
4. Layer 2 a. The Application Control Engines are interconnected via a layer 2 link. The primary purpose of this link is to allow the master ACE for the VIP to provide an Address Resolution Protocol (ARP) response to queries sent to the passive ACE. This allows the passive ACE to imitate the active. b. If either ACE were to have hardware issues, the Layer 2 connection would drop. The ARP responses would then arrive from the working connection, allowing all traffic to reroute. Ver 1-00 2460-00004 14
4 Initial OOI Network Tests c. The below diagram 6 illustrates the testing regimen. i. The procured ACEs and s will serve as layer 2 devices in the initial test. ii. TBD Diagram 6: Testing Procedure. Virtual IP 1 (VIP1) will be used to test locally control resources. VIP2 will be used to test remote or cloud based compute resources. A minor issue is Step 3 with VIP. The traffic must return to the inside interface of the Application Control Engine, so source based routing will be leveraged. Ver 1-00 2460-00004 15