Adaptive Overload Control for Busy Internet Servers Matt Welsh Intel Research and Harvard University David Culler Intel Research and U.C. Berkeley 1
The Problem: Overload in the Internet Overload is an inevitable aspect of systems connected to the Internet (Approximately) infinite user populations Large correllation of user demand (e.g., flash crowds) Peak load can be orders of magnitude greater than average Modern Internet services as highly dynamic Web servers do much more than serve up static pages e.g., server-side scripts (CGI, PHP), SSL, database access Requests have highly unpredictable CPU, memory, and I/O demands Makes overload very difficult to predict and manage Some high-profile (and low-profile) examples of overload CNN on Sept. 11th: 3, hits/sec, down for 2.5 hours E*Trade failure to execute trades during overload Final Fantasy XI launch in Japan: All servers down for 2 days Slashdot effect: daily frustration to nerds everywhere Matt Welsh, Intel Research Berkeley 2
Outline Traditional approaches to overload The Staged Event-Driven Architecture Adaptive overload control in SEDA Service differentiation and degradation Performance evaluation under massive load spikes Conclusions Matt Welsh, Intel Research Berkeley 3
Common approaches to overload control Prior work on bounding system performance metrics such as: CPU utilization, memory, network bandwidth No connection to user-perceived performance Instead, we focus on 9th percentile response time Meaningful to users, closely tied to SLAs Overload management often based on static resource limits e.g., Fixed limits on number of clients or CPU utilization Can underutilize resources (if limits set too low) or lead to oversaturation (if limits too high) Static page loads or simple performance models e.g., Assume linear overhead in size of Web page Can t account for dynamic services (scripts, SSL, etc.) Many techniques studied only under simulation Matt Welsh, Intel Research Berkeley 4
The Staged Event Driven Architecture (SEDA) list folders accept connection read packet parse packet show message... database access send response delete/refile message (some stages not shown) Decompose service into stages separated by queues Each stage performs a subset of request processing Stages use light-weight event-driven concurrency internally Each stage embodies a set of states from FSM Queues make load management explicit Event Handler Stages contain a thread pool to drive execution Small number of threads per stage Dynamically adjust thread pool sizes Thread Pool Dynamic Resource Control Apps don t allocate, schedule, or manage threads Matt Welsh, Intel Research Berkeley 5
Exposing overload to applications Overload is explicit in the programming model Every stage is subject to admission control policy e.g., Thresholding, rate control, credit-based flow control Enqueue failure is an overload signal Block on full queue backpressure Drop rejected events load shedding Can also degrade service, redirect request, etc. foreach (request in batch) { // Process request...? Reject Accept } try { next_stage.enqueue(req); } catch (rejectedexception e) { // Must respond to enqueue failure! // e.g., send error, degrade service, etc. } Matt Welsh, Intel Research Berkeley 6
Alternatives for Overload Control Basic idea: Apply admission control to each stage Expensive stages throttled more aggressively Reject request (e.g., Error message or Please wait... ) Social engineering possible: fake or confusing error message Redirect request to another server (e.g., HTTP redirect) Can couple with front-end load balancing across server farm Degrade service (e.g., reduce image quality or service complexity) Deliver differentiated service Give some users better service; don t reject users with a full shopping cart! Matt Welsh, Intel Research Berkeley 7
Feedback-driven response time control Token Bucket λ Stage Response Times Controller RT Target Distribution Adaptive admission control at each stage 9th %tile RT target supplied by administrator Measure stage latency and throttle incoming event rate to meet target Additive-increase/Multiplicative-decrease controller design Slight overshoot in input rate can lead to large response time spikes! Clamp down quickly on input rate when over target Increase incoming rate slowly when below target Matt Welsh, Intel Research Berkeley 8
Arashi: A Web-based e-mail service Yahoo Mail clone - real world service Dynamic page generation, SSL New Python-based Web scripting language Mail stored in back-end MySQL database Realistic client load generator Traces taken from departmental IMAP server Markov chain model of user behavior Overload control applied to each request type separately: admission control list folders accept connection read packet parse packet degrade service, send error page, etc. show message... delete/refile message database access send response (some stages not shown) Matt Welsh, Intel Research Berkeley 9
Overload prevention during a load spike 12 9th percentile response time (sec) 1 8 6 4 2 Load spike (1 users) 2 4 6 8 1 Time (5 sec intervals) No overload control With overload control Reject rate Load spike ends Sudden spike of 1 users hitting Arashi service 7 request types, handled by separate stages with overload controller 9th %tile response time target: 1 second Rejected requests cause clients to pause for 5 sec Overload controller has no knowledge of the service! 1.5 Reject rate Matt Welsh, Intel Research Berkeley 1
Overload control with scaling load 9th percentile response time, sec 5 45 4 35 3 25 2 15 1 Target 9th %tile RT, with overload control 9th %tile RT, no overload control Reject rate 1.9.8.7.6.5.4.3.2 Reject rate 5.1 2 4 8 16 32 64 128 256 512 124 Number of clients Matt Welsh, Intel Research Berkeley 11
Service Differentiation Token Buckets λλ Stage Response Times Controller Distributions Target Target Differentiate users into multiple classes Give certain users higher priority than others Based on IP address, cookie, header field, etc. Multiclass admission controller design Gather RT distributions for each class, compare to target If RT below target, increase rate for this class If RT above target, reduce rate of lower priority classes Matt Welsh, Intel Research Berkeley 12
Service differentiation at work 9th percentile response time, sec 5 4 3 2 1 Target Low priority High priority 1.8.6.4 Fraction of rejected requests.2 5 1 15 2 25 Time (5 sec intervals) Two classes of users with a 1 second response time target 128 users in each class High priority requests suffer fewer rejections Without differentiation, both classes treated equally Matt Welsh, Intel Research Berkeley 13
Service degradation 9th percentile response time (sec) 8 7 6 5 4 3 2 1 Load spike 2 4 6 8 1 Time (5 sec intervals) No overload control Degrade only Degrade + Reject Quality (Degrade only) Quality (Degrade + Reject) Reject rate Load spike ends 1.5 Quality / Reject rate Degrade fidelity of service in response to overload Artifical benchmark: Stage crunches numbers with a varying quality level Stage performs AIMD control on service quality under overload Enable/disable admission controller based on response time and quality Matt Welsh, Intel Research Berkeley 14
Related Overload Management Techniques Dynamic listen queue thresholding [Voigt, Cherkasova, Kant] Threshold or token-bucket rate limiting of incoming SYN queues Problem: Dropping or limiting TCP connections is bad for clients! Specialized scheduling techniques [Crovella, Harchol-Balter] e.g., Shortest-connection-first or Shortest-remaining-processing-time Often assumes 1-to-1 mapping of client request to server process Class-based service differentiation [Bhoj, Voigt, Reumann] Kernel- and user-level techniques for classifying user requests Sometimes requires pushing application logic into kernel Adjust connection/request acceptance rate per class No feedback - static assignment acceptance rates We argue that overload management should be an application design primitive and not simply tacked onto existing systems Matt Welsh, Intel Research Berkeley 15
Control theoretic resource management Increasing amount of theoretical and applied work in this area Control theory for physical systems with (sometimes) well-understood behaviors Capture model of system behavior under varying load Design controllers using standard techniques (e.g., pole placement) e.g., PID control of Internet service parameters [Diao, Hellerstein] Feedback-driven scheduling [Stankovic, Abdelzaher, Steere] Accurate system models difficult to derive Capturing realistic models is difficult Highly dependent on test loads Model parameters change over time Upgrading hardware, introducing new functionality, bit-rot Much control theory based on linear assumptions Real software systems highly nonlinear Matt Welsh, Intel Research Berkeley 16
Future Work Automatic profiling, modeling, and tuning of overload controller Capture traces of stage performance vs. traffic and load mix Offline or online tuning of admission control parameters Use learning algorithms? Extend local overload controller to global actions Adjust front door admission rate based on back-end bottleneck Prioritize stages that release resources Use global information, e.g., memory availability, to help Further explore tradeoff between request rejection and degradation How to build general-purpose degradation into a service? Tie in with complex set of service level agreements Matt Welsh, Intel Research Berkeley 17
Summary SEDA programming model exposes overload to the application Break event-driven application into stages connected with queues Queues can reject new requests - overload signal Adaptive overload control at each stage Attempt to meet 9th-percentile response time target Adjust admission rate of each stage s queue Differentiated service using multiple admission controllers Extensive evaluation in realistic overload setting Arashi service with highly dynamic behavior Realistic client load generator Evaluated overload control, service differentiation, and service degradation For more information, software, and papers: http://www.cs.berkeley.edu/~mdw/proj/seda/ Matt Welsh, Intel Research Berkeley 18
Backup slides follow Matt Welsh, Intel Research Berkeley 19
Adaptive overload control algorithm Monitor response time for each request in system Tag with current time on entry to service Gather distribution of accumulated response times at each stage Controller adjusts admission rate of requests using token bucket Controller invoked after N requests processed or timeout EWMA filter used on 9th-percentile RT estimate Calculates error between current RT estimate and target If err > err d, token bucket reduced by multiplicative factor: adj d. If err < err i, token bucket increased by additive factor: (err c i )adj i. Parameters determined through extensive experimentation N = 1, timeout = 1 sec EWMA filter =.7 err i = -.5, err d = adj i = 2., adj d = 1.2 Matt Welsh, Intel Research Berkeley 2
Response Time Controller Operation 9th percentile response time (sec) 4 3.5 3 2.5 2 1.5 1.5 Target 9th percentile RT Admission rate 22 2 18 16 14 12 1 8 6 4 2 Admission rate (reqs/sec) 6 8 1 12 14 16 Time (5 sec intervals) Adjust incoming token bucket using AIMD control Target response time 1 second Sample response times of requests through stage After 1 samples or 1 second: Sort measurements and measure 9th percentile If 9th RT <.9 target RT, add f(err) to rate If 9th RT > target RT, divide rate by 1.2 Matt Welsh, Intel Research Berkeley 21
Overload control by request type 2 1818 36589 116735 3133 19673 Request type login list folders list messages 9th percentile response time (ms) 15 1 5 3847 5752 9286 3875 6852 674 868 11358 1633 show message delete message refile message search Target Response Time 124 users with AC 124 users without AC Matt Welsh, Intel Research Berkeley 22
Without response time control 15 135 Target Low priority High priority 9th percentile response time, sec 12 15 9 75 6 45 3 15 5 1 15 2 25 Time (5 sec intervals) Two classes of users without overload control enabled 128 users in each class Terrible response time performance Matt Welsh, Intel Research Berkeley 23
Without service differentation 9th percentile response time, sec 5 4 3 2 1 Target Low priority High priority 1.8.6.4 Fraction of rejected requests.2 5 1 15 2 25 Time (5 sec intervals) Two classes of users with a 1 second response time target 128 users in each class No differentiation between classes of users High-priority users see same loss rate as low priority Matt Welsh, Intel Research Berkeley 24
SEDA Scales Well with Increasing Load Throughput, MBit/sec 24 22 2 18 16 14 12 1 8 6 4 2 Apache Flash SEDA Throughput 1 2 4 8 16 32 64 128 256 512 124 Number of clients 4-way Pentium III 5 MHz, Gigabit Ethernet, 2 GB RAM, Linux 2.2.14, IBM JDK 1.3 SEDA throughput 1% higher than Apache and Flash (which are in C!) But higher efficiency was not really the goal! Apache accepts only 15 clients at once - no overload despite thread model But as we will see, this penalizes many clients Matt Welsh, Intel Research Berkeley 25
Max Response Time vs. Apache and Flash Max response time, seconds 1 9 8 7 6 5 4 3 2 1 Apache Flash SEDA 1 4 16 64 256 124 Number of clients Apache and Flash are very unfair when overloaded 1.7 minutes 37 sec 3.8 sec Long response times due to exponential backoff in TCP SYN retransmit Not accepting connections is the wrong approach to overload management Matt Welsh, Intel Research Berkeley 26
User-level vs. kernel-level resource management SEDA is a user-level solution: no kernel changes Runs on commodity systems (Linux, Solaris, BSD, Win2k, etc.) In contrast to extensive work on specialized OS, schedulers, etc. Explore resource control on top of imperfect OS interface Grey box approach - infer properties of underlying system from observed behavior What would a SEDA-based dream OS look like? Scalable I/O primitives: remove emphasis on blocking ops SEDA stage-aware scheduling algorithm? Greater exposure of performance monitors and knobs Matt Welsh, Intel Research Berkeley 27
Scalable concurrency and I/O interfaces Threads don t scale, but are the wrong interface anyway Too coarse-grained: Don t reflect internal structure of a service Little control over thread behavior (priorities, kill -9) I/O interfaces typically don t scale 25 225 2 Bandwidth, Mbit/sec 175 15 125 1 75 5 (Can t run beyond 4 connections) 25 SEDA asynchronous socket layer Thread-based asynchronous socket layer 1 4 16 64 256 124 496 16384 Number of connections Matt Welsh, Intel Research Berkeley 28
Distributed programming models and protocols HTTP pushes overload into the network Relies on TCP connection backoff rather than more explicit mechanisms Simultaneous connections, progressive download, and out-of-order requests complicate matters Protocol design should consider service availability Distributed computing models generally do not express overload CORBA, RPC, RMI,.NET all based on RPC with generic error conditions On error, should app fail, retry, or invoke an alternate function? Single bottleneck in large distributed system causes cascading failure in network web service storage service browser bottleneck etc. Matt Welsh, Intel Research Berkeley 29
Playing dodgeball with the kernel OS resource management abstractions often inadequate Resource virtualization hides overload from applications e.g., malloc() returns NULL when no memory Forces system designers to focus only on capacity planning Internet services require careful control over resource usage e.g., Avoid exhausting physical memory to avoid paging Back off on processing heavyweight requests when saturated SEDA approach: Application-level monitoring & throttling Service performance monitored at a per-stage level Request throughput, service rate, latency distributions Staged model permits careful control over resource consumption Throttle number of threads, admission control on each stage Cruder than kernel modifications, but very effective (and clean!) Matt Welsh, Intel Research Berkeley 3