Load Balancing Topics 1. What is load balancing? 2. Load balancing techniques 3. Load balancing strategies 4. Sessions 5. Elastic load balancing What is load balancing? load balancing is a technique to distribute workload evenly across two or more computers, network links, CPUs, hard drives, or other resources, in order to get optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Wikipedia 1
Load Balanced Servers Internet Server 1 Server 2 Server N Load Balancing + Failover Load Balancing + Failover + Cloud 2
Round robin DNS Layer 4 load balancer Web switch Reverse proxy Load balancing techniques Round Robin DNS Configure multiple A records for single name Return next IP address from circular list Some servers may try to return closest IP address. Problems Clients cache addresses. No awareness of availability status of addresses. > dig google.com Round Robin DNS Entry ;; ANSWER SECTION: google.com. 21 IN A 74.125.225.18 google.com. 21 IN A 74.125.225.19 google.com. 21 IN A 74.125.225.20 google.com. 21 IN A 74.125.225.16 google.com. 21 IN A 74.125.225.17 3
Layer 4 Load Balancing Switch Network Address Translation (NAT) for LB One external IP address Switch maps incoming connections to one of the pool of servers. Switch maps outgoing connection source IP address to IP address of the switch. Additional features Better load balancing algorithms Detection of availability status of servers Application Presentation Session Transport Network Data Link Physical Web Switch AKA content switch or application switch Application Presentation Session Transport Network Data Link Physical Balances load at application layer Multiple load balancing algorithms Handles SSL for servers May also perform compression May also add firewall features http://www.f5.com/pdf/products/big-ip-platforms-ds.pdf 4
Reverse Proxy Proxy server has network identity Forwards requests to backend web servers Features can include caching, compression, SSL Open Source Load Balancers Perlbal used by LiveJournal, TypePad HAProxy used inside hw + sw load balancers Varnish HTTP accelerator, caching + balancing Pound Security-focused HTTP accelerator mod_proxy_balance Apache module NGinx lightweight, high performance web proxy Round robin Least connections/time Predictive Random Weighted strategies Load balancing strategies 5
Round Robin Server 1 Server 5 Server 2 Server 4 Server 3 Least Connections / Response Time Send requests to server with the least number of connections or lowest response time. Good for balancing between requests with different requirements or servers with different performance levels. Problems can arise with multiple load balancers making decisions in parallel. Weights can be added for manual tweaking. Predictive Round robin or least connections with additional heuristics to compensate for information staleness issue arising from many short rapid transactions. 6
Random Either select server at random or combine with resource-based algorithm to deal with information staleness problems. Weighted random adds manual constant to probability of choosing particular servers. A dual-core isn t twice as fast as a single CPU. Requires trial and error experimentation. Sessions The Problem User interactions with a server have a certain amount of state. Authentication Current stage of transaction Shopping carts, etc. How can the application preserve state when the load balancer sends next request to a different server? 7
Session Stickiness Stickiness requires session-aware load balancer Ensures that future requests from same session always go to same server. Problems Lack of failover since requests tied to single server Difficult to allocate resources effectively since sessions vary tremendously in duration and size Client-side State Why not store all state in client cookies? Insufficient client storage (ameliorated by HTML5) Insecure (can t trust client to control prices, id) Solution Store frequently accessed low security data in client cookies. Store rarely accessed data in central backend storage and take performance hit to access when necessary. Elastic load balancing 8
Elastic Load Balancing Distributes traffic across EC2 instances Instances can be located across multiple AZs Supports any TCP based protocol Monitors instance health and will not distribute traffic to unhealthy instances Supports SSL termination Supports session stickiness Provides metrics to CloudWatch ELB Lifecycle 1. Create an ELB List of AZs Parameters for health check List of listeners 2. Add instances to ELB By instance ID ELB will track status: InService, OutOfService 3. Advertise public DNS name of ELB 4. Modify number of instances to match traffic Or setup CloudWatch/AutoScale to handle for you 5. Delete ELB when unneeded 9
ELB, CloudWatch, and AutoScale CloudWatch Monitoring service for EC2 CPU utilization Data transfer Storage usage Features Notifications at user-specified metric thresholds Enables AutoScaling at metric thresholds Pricing Basic Monitoring with 5 minute granularity free Detailed Monitoring (1 minute) for 1.5 per hour 10 per alarm after first 10 alarms CloudWatch Terminology A namespace represents a source of data AWS/EC2 AWS/ELB A measure is a raw, observed data value One minute s worth of observation A unit is an attribute of a measure Seconds, %, bytes, bits, counts, bytes/s, bits/s A dimension is a refined view of a type of data AvailabilityZone, ImageType, InstanceID, A metric is a stored, processed measure A statistic is a computed attribute of a metric Minimum, maximum, average, sum 10
AutoScaling AutoScaling Group Set of EC2 instances that should scale together Triggers Scale on CloudWatch alerts Scale on time-based schedule Fixed number of healthy instances Examples Add 3 instances if CPU > 50% Remove 3 instances if CPU < 10% AutoScaling Setup 1. Create an ELB 2. Create AutoScaling launch configuration ID of AMI to be launched Instance type Key pair to authenticate to instances List of EC2 security groups for instances 3. Create an AutoScaling Group 4. Create a trigger for the group AutoScaling Operation 1. CloudWatch metrics specified in AutoScaling group s trigger are retrieved at specified times. 2. Metrics checked against trigger thresholds If metrics larger than UpperThreshold and the number of instances less than MaxSize, a scaleout event is initiated, launching new instances. If metrics are smaller than LowerThreshold and number of instances is greater than MinSize, a scale-in event is initiated, terminated excess instances. 11
Command Line Tools Command Line Tools EC2 API Tools (we ve already used these) CloudWatch API Tools AutoScaling API Tools Elastic Load Balancing API Tools Configuration Tools are installed on kosh source ~waldenj/.ec2-tool-setup Key Points Load balancing distributes transactions across multiple servers, CPUs, links, or storage devices Techniques Round robin DNS (configuration) Layer 4 switch (hardware) Web switch (hardware) Reverse proxy (software) Algorithms Round robin Least connections/response time Predictive heuristics Random Weighted variants Session stickiness References 1. Jeff Barr, Host Your Web Site in the Cloud: Amazon Web Services Made Easy, Sitepoint, 2010. 2. Theo Schlossnagle, Scalable Internet Architectures, Sams Publishing, 2007. 3. Willy Tarreau, Making Applications Scalable with Load Balancing, http://1wt.eu/articles/2006_lb/, 2006. 12