High Availability HTTP/S R.P. (Adi) Aditya rpaditya@umich.edu Senior Network Architect
HTTP/S is not the Internet HTTP/S Internet so why care about High Availability HTTP/S? because HTTP/S is such a large part of the interactive Internet, we need, responsive, always available web applications
HTTP HTML is the markup used for websites HTTP is the common Layer-4 protocol that is typically carries traffic between Server (website) and Client (browser). HTTP/L4 is carried on TCP/L3 -- TCP provides a connection over the IP (packet) network
HTTP (1.0) was and is successful
HTTP 1.1 added refinements what we care about for this discussion: request pipelines/multiplexing connection keepalives
If the TCP connection fails because the server fails, the website is unavailable -- how do we minimize the chance that happens?
When things fail, cluster! HTTP has no defined behaviour for "retry" or "failover" when a website does not respond. So the responsibility for "clustering" falls to the infrastructure layer -- L3 or below. To be clear, resorting to SLBs represents a failure of application and L4 protocol design! (according to network folks)
Round Robin DNS
State sync between servers? application often needs to be shared between browser and server (eg. shopping cart or list of exam questions answered) this state is stored in a "session", typically on the server side and a "pointer" is given to the browser to send on each subsequent request use HTTP cookies or some other shared token between browser and server
Distributed systems so how to replicate sessions between cluster members? luckily HTTP is request/response!
Network Server Load Balancer
Redundant load-balancers, redundant servers and "transparent" automatic failover
How do we prevent the failover or unintended flip-flopping between backend servers? A persistence mechanism.
Persistence methods need a way to identify individual clients and always send them to the same (up) server could use client (src) IP address, but in case of NAT or proxy-server could overload backend server -- also need to maintain mapping table from src IP to backend on SLB eating memory prefer having the SLB insert an additional cookie with the mapping on the first response - scales well
And what happens to the persistence if a backend server fails?
SSL Offload We want the browser/client to think that the Network SLB is the webserver We want the SLB to be able to look at the HTTP headers (and possibly request body) So we need to decrypt SSL at the SLB, so need the site SSL cert and key on the SLB That allows us to insert persistence cookie!
Some problems using a SLB lack of end-to-end sessions (SLB needs to understand protocol) lack of transparency when troubleshooting -- yet another bump in the wire to muck things up SSL-offload is computationally expensive expensive hard/soft-ware, many bad choices hard to find developers who understand the network and SLB, hard to find network folks who understand the application
use a SLB to provide IPv6 for a website
HA beyond a single site GSLB - Global server load-balancing = usually uses DNS to redirect to a "closest" datacenter, and then depend on that staying up -- workable, but not "transparent" short DNS ttls and quick failover? possible, but again, puts client's cache behaviour in charge Anycast routing - works, but tricky to troubleshoot
SRV records
Why not SRV today? only promised support in browsers feature request open in Firefox (12 years and counting), Chrome and Safari not clear for other browsers maybe we wait for HTTP 2.0, likely beyond because current RFC is early draft
Anycast auto-routes to "nearest" instance of app
Anycast auto-reroutes on failure of an app instances
UM LMS anycast routing stability red lines are route switches, all above, since July, happen during planned maintenance -- real measure of stability is whether end-users see session loss due to switch