Web Application Scalability: A Model-Based Approach

Coyright 24, Software Engineering Research and Performance Engineering Services. All rights reserved. Web Alication Scalability: A Model-Based Aroach Lloyd G. Williams, Ph.D. Software Engineering Research 264 Ridgeview Lane Boulder, Colorado 832 (33) 938-9847 boulderlgw@aol.com Connie U. Smith, Ph.D. Performance Engineering Services PO Box 264 Santa Fe, New Mexico, 8754-264 (55) 988-38 htt://www.erfeng.com/ Scalability is one of the most imortant quality attributes of today s software systems. Yet, desite its imortance, scalability in alications is oorly understood. This aer resents a model-based view of scalability in Web and other distributed alications that is aimed at removing some of the myth and mystery surrounding this imortanoftware quality. We review four models of scalability and show how they relate to Web and other distributed alications. The alicability of these models is demonstrated with a case study. INTRODUCTION Web and other distributed software systems are increasingly being deloyed to suort key asects of businesses including sales, customer relationshi management (CRM), and data rocessing. Examles include: online shoing, rocessing insurance claims, and rocessing financial trades. Many of these systems are deloyed on the Web or on dedicated networks using Web technologies such as J2EE. As these businesses grow, the systems thauort their functions also need to grow to suort more users, rocess more data, or both. As they grow, it is imortant to maintain their erformance (resonsiveness or throughut) [Smith and Williams 22]. Poor erformance in these alications often translates into substantial costs. Customers will often sho elsewhere rather than endure long waits. Slow resonses in CRM alications mean that more customer-service reresentatives are needed. And, failure to rocess financial trades in a timely fashion can result in statutory enalties as well as lost customers. These forces combine to make scalability one of the most imortant quality attributes of today s software systems. Yet, desite its imortance, scalability in alications is oorly understood and there is no generally acceted definition of scalability. In this aer, we will use the following definition: Scalability is a measure of an alication system s ability to without modification cost-effectively rovide increased throughut, reduced resonse time and/or suort more users when hardware resources are added. Scalability is a system roerty. The software architecture is a key factor in determining scalability. For examle, if the software architecture is not able to use additional resources to increase throughut the system will not be scalable. The choice of execution environment is also very imortant. As we will see, the same workload executed on two different latforms can exhibiignificantly differencalability roerties. Scalability in Web and other distributed systems is a comlex roblem. Removing a bottleneck changes the dynamics of the system and a different bottleneck may emerge to imose new limitations on the scalability. Thus, it is necessary to reevaluate the behavior of the system after making changes. This aer resents a model-based view of scalability in Web and other distributed alications. It is aimed at removing some of the myth and mystery surrounding this imortanoftware quality. It reviews revious work on scalability and shows how it relates to distributed Web alications. This aer also rovides a simle demonstration that extraolations of near-linear results are likely to overestimate scalability. Finally, it offers a reformulation of Gustafson s Law that is more aroriate for evaluating Web alication scalability and demonstrates that this law is alicable to some Web alications. The analyses described here are art of the PASA SM aroach to the erformance assessment of software architectures [Williams and Smith 22]. They are emloyed in situations where scalability is a concern

and where the required measurements can be obtained. We begin with a motivating examle followed by a review of scalability measures and models. A simle case study then illustrates how these models aly to Web alications. MOTIVATION Consider the following data (ublished on the Web) for horizontal scalability of a samle alication develoed using commercial Web server software. Table : Horizontal Scaling Data Number of Nodes Transactions er Second 66.6 2 26.4 3 78.2 4 235.6 The reort also contains the statement: Multile Web servers scale near-linearly (8 9%) at this level using [Product X]. While this statement is, at a casual glance, true it is also otentially dangerously misleading. A straightforward extraolation to higher numbers of rocessors would give the scalability shown by the dashed line in Figure. Throughut 2,,5, 5 5 5 2 25 3 35 Number of Nodes Figure : Extraolated Scaling This tye of extraolation is all too common in our exerience. However, as we will see, the actual scalability of this system is more likely to follow the solid curve. The difference between these two redictions is significant. It could mean the difference between successful deloyment of the alication and financial disaster. This examle underscores the imortance of obtaining a thorough understanding of the scalability characteristics of a system before committing resources to its exansion. One way to accomlish this is to determine through analysis of measured data whether the behavior of the system fits that of a known model of scalability. Once there is a degree of confidence in how well a model describes the system, extraolations such as the one above become less risky. SPEEDUP AND SCALEUP Seedu is a measure of the reduction in time required to execute a fixed workload as a result of adding resources such as rocessors or disks. Seedu is the most common metric for evaluating arallel algorithms and architectures. Scaleu, on the other hand, is a measure of the increase in the amount of work that can be done in a fixed time as a result of adding resources. Scaleu is a more relevant metric for Web alications where the rincial concern is whether we can rocess more transactions or suort more users as we add resources. Although they might aear to be very different metrics, seedu and scaleu are really two sides of the same coin. Clearly, if we can execute a transaction more quickly, we can execute more transactions in a given amount of time. More formally, the seedu is given by: S ( ) T = ---------- T ( ) where T() is the time required to erform the work with one rocessor and T() is the time required to erform the same amount of work with rocessors. Scaleu may be exressed as a ratio of the caacity with rocessors to the caacity with one rocessor. This ratio is sometimes known as the scaling factor. It has also been called the relative caacity, C() [Gunther 2]. If we use the maximum throughut as a measure of the caacity, we can exress the scaling factor or relative caacity as: C ( ) = -------------------- where () is the maximum caacity with one rocessor and () is the maximum caacity with rocessors. Since the maximum throughut of a system is equal to the inverse of the demand at the bottleneck resource [Jain 99], we have: C ( ) D -------------------- b = = ------------- D b The demand at a resource is the total time t b required to execute the workload at that resource (i.e., visits x service time). Thus, if we aroximate the behavior of 2

the system by a single queue/server reresenting the bottleneck device, we can write: global state. These factors cause the relative caacity to increase more slowly than linearly. C ( ) D -------------------- b t b = = ------------- = ------------ S ( ) D b t b While this aroximation should be verified (see the case study below), it is a good one in most cases. We will use C() as a measure of the scalability of the system. Once the function (model) describing C() has been determined, the throughut of the system with rocessors can be obtained by multilying () by C(). CATEGORIES OF SCALABILITY We will categorize the scalability of a system based on the behavior of C(). This classification scheme is similar to that roosed by Alba for seedu in arallel evolutionary algorithms [Alba 22]. The categories are: Linear the relative caacity is equal to the number of rocessors,, i.e., C() =. Sub-linear the relative caacity with rocessors is less than, i.e., C() <. Suer-linear the relative caacity with rocessors is greater than, i.e., C() >. Linear Scalability Linear scalability can occur if the degree of arallelism in the alication is such that it can make full use of the additional resources rovided by scaling. For examle, if the alication is a data acquisition system that receives data from multile sources, rocesses it and reares it for additional downstream rocessing, it may be ossible to run multile streams in arallel to increase caacity. In order for this alication to scale linearly, the streams must not interfere with each other (for examle via contention for database or other shared resources) or require a shared state. Either of these conditions will reduce the scalability below linear. Sub-Linear Scalability Sub-linear scalability occurs when the system is unable to make full use of the additional resources. This may be due to roerties of the alication software, for examle if delays waiting for a software resource such as a database lock revent the software from making use of additional rocessors. It may also be due to roerties of the execution environment that reduce the rocessing ower of additional rocessors, for examle overhead for scheduling, contention among rocessors for shared resources such as the system bus or communication among rocessors to maintain a 3 Suer-Linear Scalability At first glance, suer-linear scalability would seem to be imossible a violation of the laws of thermodynamics. After all, isn t it imossible to get more out of a machine than you ut in? If so, how can we more than double the throughut of a comuter system by doubling the number of rocessors? The fact is, however, thauer-linear scalability is a real henomenon. The easiest way to see how this comes about is to recognize that, when we add a rocessor to a system, we are sometimes adding more than just a CPU. We often also add additional memory, disks, network interconnects, and so on. This is esecially true when exanding clusters. Thus, we are adding more than just rocessing ower and this is why we may realize more than linear scaleu. For examle, if we also add memory when we add a rocessor, it may be ossible to cache data in main memory and eliminate database queries to retrieve it. This will reduce the demand on the rocessors, resulting in a scaleu that is more than can be accounted for by the additional rocessing ower alone. The nexection discusses several models that exhibit these categories of behavior. MODELS OF SCALABILITY This section discusses four models of scalability:. Linear scalability 2. Amdahl s law 3. Suer-Serial Model 4. Gustafson s Law The Suer-Serial Model is an extension of Amdahl s Law and has aeared in the context of on-line transaction rocessing (OLTP) systems [Gunther 2], in beowulf-style clusters of comuters [Brown 23] and others. The other three were develoed in the context of seedu for arallel algorithms and architectures. Amdahl s Law and the Suer-Serial Model, describe sub-linear scalability. Gustafson s Law, describes suer-linear scalability. These models were develoed in a different context, but as we illustrate they aly to Web alications. Other models of scalability, such as memory-bounded seedu [Sun and Ni 993], are available but are beyond the scoe of this aer. Linear Scalability With linear scalability the relative caacity, C(), is equal to the number of rocessors,.

C L = For a system thacales linearly, a grah of C() versus is a straight line with a sloe of one and a y-intercet of zero [C()=]. Figure 2 shows a grah of Equation. Seedu = S T A = ---------- = T ( ) S A = --------------------------------------------- t -------------- s t -------------- = + + + -- t t + t --------------------- + t ------------------- σ + π Relative Caacity (C ( )) 35 3 25 2 5 5 5 5 2 25 3 35 Number of Processors ( ) Here, σ is the fraction of the time sent on the sequential arts of the rogram and π is the fraction of time sent on the arts of the rogram that can be executed in arallel. Since π = σ: S A = ------------------------------ + σ( ) Or, using the equivalence between seedu and scaleu, Figure 2: Linear Scalability Note that our definition of C() does not allow for linear scalability with a sloe that is not equal to one, i.e. C() = k where k. If this were ossible, C() could be different from one. But, since C = -------------------- = C A = ------------------------------ + σ( ) If σ = (i.e., no ortion of the workload is executed sequentially), Amdahl s Law redicts unlimited linear scaleu (i.e., C A () = ). For non-zero values of σ, the scaleu will be less than linear. Figure 3 shows a comarison of linear scalability with Amdahl s Law scalability for σ =.2. 2 k must be equal to one. This means that both sub-linear and suer-linear scalability must, in fact, be described by non-linear functions. While measurements amall numbers of rocessors may aear to be linear, measurements at higher numbers of rocessors will reveal the non-linearity. This also means that linear extraolations of near-linear results, such as that in our oening examle, can be misleading. Since the actual function is necessarily non-linear, these extraolations will overestimate the scalability of the system if the sloe of the extraolated line is less than one and underestimate it if the sloe is greater than one. Amdahl s Law Amdahl s Law [Amdahl 967] states that the maximum seedu obtainable from an infinite number of rocessors is /σ where σ is the fraction of the work that must be erformed sequentially. If is the number of rocessors, is the time sent by a sequential rocessor on the sequential arts of the rogram, and t is the time sent by a sequential rocessor on the arts of the rogram that can be executed in arallel, we can write the Seedu for Amdahl s Law, S A : Relative Caacity (C ( )) 35 3 25 2 5 5 5 5 2 25 3 35 Number of Processors ( ) Linear Amdahl's Law σ =.2 Figure 3: Amdahl s Law versus Linear Scalability The maximum seedu that can be obtained, even using an infinite number of rocessors is: lim S A This means that, if the serial fraction of the workload is.2, the maximum seedu that can be achieved is 5 and it will take an infinite number of rocessors to achieve that! Amdahl s argument was that, given this limitation, a fasingle-rocessor machine is more cost-effective than a multirocessor machine. As Figure 3 shows, there are diminishing returns for adding more rocessors. The enalty increases as σ = -- σ 4

increases. For σ=.2, Amdahl s Law redicts that adding a second rocessor will yield a relative caacity of.67. That is, the maximum throughut with two rocessors will be.67 times that with one rocessor. If, instead of adding a second rocessor, we relace the single rocessor with one twice as fast, the throughut will then be exactly twice that with the slower rocessor. This is because the faster rocessor reduces the time required for both the serial and arallel ortions of the workload. Because of this, it is generally more costeffective to use a faster single rocessor than to add rocessors to achieve increased throughut in cases where Amdahl s Law alies. Suer-Serial Model Gunther [Gunther 2] oints out that Amdahl s Law may be otimistic in cases where there is interrocessor communication, for examle to maintain cache consistency among rocessors. In these cases if, when an udate is needed, a rocessor sequentially sends udates to each of the other rocessors and the time required to rocess and send a message is t c, we have the suer-serial caacity for rocessors, C S (): C S = T ---------- = T ( ) Or, after some algebra: C S where γ is the fraction of the serial work that is used for interrocessor communication. This result is identical to the Amdahl s Law result with an extra term in the denominator for overhead due to interrocessor communication. Gunther has called Equation 3 the Suer- Serial Model [Gunther 2]. The term in Equation 3 that contains γ grows as the square of the number of rocessors. This means that, even if the overhead for interrocessor communication is small, as the number of rocessors increases, the communication overhead will eventually cause C() to reach a maximum and then decrease. Figure 4 shows a comarison of Equation 3 for σ =.2 and various values of γ with linear scalability and Amdahl s Law. Gustafson s Law For certain alications, it was found thaeedus greater than that redicted by Amdahl s Law are ossible. For examle, some scientific alications were found to undergo a seedu of more than, on a,24 rocessor hyercube. Gustafson [Gustafson 988] noted that Amdahl s Law assumes that the arallel fraction of the alication (π = σ) is constant, i.e., indeendent of the number of rocessors. Yet, in many + t -------------------------------------------------- + t + t c ( ) = ----------------------------------------------------------------- + σ[ ( ) + γ ( ) ] 3 5 Relative Caacity (C ( )) 5 4 3 2 5 5 2 25 3 Number of Processors ( ) Figure 4: Suer-Serial Model σ =.2, γ = σ =.2, γ =. σ =.2, γ =.5 σ =.2, γ =. cases, the amount of arallel work increases in resonse to the resence of additional comutational resources but the amount of serial work remains constant. For examle, with more comuting ower, matrix maniulations can be erformed on larger matrices in the same amount of time. In these cases, π (and, therefore, σ) is actually a function of the number of rocessors. If and t are the times required to execute the serial and arallel ortions of the workload on a arallel system with rocessors, a sequential rocessor would require a time of + (t x ) to erform the same work. Gustafson termed this scaled seedu which is described in the following equations: t Scaled Seedu S s + ( t ) = G = ---------------------------- + S G = + σ ( ) where σ is the serial fraction of the work erformed on rocessors. Equation 4 is known as Gustafson s Law or the Gustafson-Barsis Law. Gustafson s Law describes fixed-time seedu while Amdahl s Law describes fixed-size seedu. As with Amdahl s Law, Gustafson s Law also alies to scalability. We can not use the formulation in Equation 4 directly, however. Equation 4 describes the seedu as the ratio of the time required to execute the workload on a system with rocessors to that required to execute the same amount of work on a single rocessor. This is not a ratio that is likely to be measured, however. We are more likely to have measurements of the maximum throughut at various numbers of rocessors. Thus, to use Gustafson s Law for web alication scalability, we need to exress it in terms of C(), the ratio of the maximum throughut with rocessors to the maximum throughut with one rocessor. t 4

The demand with one rocessor is () + t () and the maximum throughut is therefore: = ----------------------------- + t Gustafson s Law assumes that the arallel ortion of the workload increases as the number of rocessors. Thus, the total demand with rocessors is () +(t () x ). However, this demand is sread over rocessors, so the average demand er rocessor is: D b + ( t ) = ------------------------------------------- Note that the average demand er rocessor is a decreasing function of. This is because only one rocessor can execute the serial ortion of the workload. If the degree of arallelism is such that the alication is able to make use of this additional caacity, each rocessor beyond one will be able to execute more arallel work than t(), resulting in suer-linear scaling. This can occur in Web alications when loading a age results in multile concurrent requests from the client to retrieve information, such as gifs and other age elements. Additional rocessors enable those requests to execute in arallel. Under these conditions, the average maximum throughut er rocessor is: -------------------- = ------------- = D b ------------------------------------------- + ( t ) and the maximum throughut for rocessors is: Using these results, we can write the relative caacity for Gustafson s Law as: C G where σ () is the serial fraction of the work with one rocessor. As the value of σ () aroaches zero, this function aroaches linear scalability. In fact, for small values of σ (), Equation 5 is difficult to distinguish from linear scalability. With non-zero values of σ (), the second term in the denominator is negative for values of greater than one, however. This means that C() will increase faster than giving suer-linear scalability. 2 = ------------------------------------------- + ( t ) 2 = --------------------------------------- + σ ( ) 5 Figure 5 shows a grah of C() versus for Gustafson s Law with two values of σ (). As the figure shows, amall values of σ () (e.g.,.) Gustafson s Law is difficult to distinguish from linear scalability. At higher values of σ (), however, the curve definitely shows its non-linearity. Relative Caacity (C ( )) 45 4 35 3 25 2 5 5 The nexection illustrates the alicability of these models with a case study. CASE STUDY σ () =.2 5 5 2 25 3 35 Number of Processors ( ) Figure 5: Gustafson s Law σ () =. This case study is based on the Avitek Medical Records (MedRec) samle alication develoed by BEA Systems, Inc. [BEA 23a]. This alication is an educational tool that is intended to illustrate best ractices for designing and imlementing J2EE alications. It has also been used to demonstrate the scalability of BEA s WebLogic Server [BEA 23b]. MedRec is a three-tier alication that consists of the following layers [BEA 23a]: Presentation layer The resentation layer is resonsible for all user interaction with MedRec. It accets user inut for forwarding to the alication layer and dislays alication data to the user. Alication layer The alication layer encasulates MedRec s business logic. It receives user requests from the resentation layer or from external clients via Web Services and may interact with the database layer in resonse to those requests. Database layer The database layer stores and retrieves atient data. Users interact with MedRec via a browser using HTTP requests and resonses. External clients may also use MedRec as a Web Service. The resentation and alication layers reside on one or more alication serv-. WebLogic Server is a trademark of BEA Systems, Inc. 6

ers. The database layer resides on a searate database server. Figure 6 shows a schematic view of the MedRec alication. Client Figure 6: MedRec Alication Configuration This case study is based on measurements of this alication ublished by BEA Systems, Inc. [BEA 23b]. These measurements were made to demonstrate the scalability of BEA s WebLogic Server. Because of this, no secific erformance requirements were secified. Table 2 shows the demand for each of the resources in the system. The alication server CPU has the highest demand and is therefore the bottleneck resource. Table 2: Resource Demand Resource Alication Server Alication Server CPU.227 Database Server CPU.68 Disk.38 Database Server Demand We begin by examining the validity of the single-queue aroximation for describing the behavior of the system. The following sections then exlore vertical and horizontal scaling characteristics of this alication. Single Queue Aroximation This analysis is based on measured data for an alication server with a single 75 MHz rocessor. The database server was an 8-rocessor (75 MHz) machine with RAID disks. The measurements aear in Table 3. Note that the high number of transactions er second for a single client is due to using a think time of zero for these measurements. We recommend a more realistic think time for your measurementudies. There are a few other asects of the measurements that we question, but we are unable to confirm their validity because we did not conduct these measurement studies. From the maximum throughut and the throughut with one client, we can calculate the demand at the bottleneck resource, D b and the total demand, D T [Jain 99]. D b = ------------- = -------------- =.227 sec 43.96 7 Table 3: Measured Throughut versus Number of Clients Number of Clients Transactions er Second 4.39 4 43.96 42.78 2 43.54 4 42.74 8 43. 43.5 Max TPS 43.96 Aserver CPU Utilization 98% DB Server CPU Utilization <% DB Server Disk Utilization 6% D T = ---------- = -------------- =.242 sec X 4.39 Note that the total demand with one user is the sum of the demands in Table 2,.242 sec. Also note that the bottleneck demand is 93.8% of the total demand. With such a large ercent of the overall demand attributable to the bottleneck resource, the single-queue aroximation is a good one. Throughut (TPS) 5 45 4 35 3 25 2 5 5 5 5 2 Number of Clients Measured Modeled Figure 7: Measured versus Modeled Data Using the results for D b and D T, we can construct a simle system model using a single queue/server that reresents the bottleneck resource in this case, the alication server CPU. Figure 7 shows a ortion of the measured data for the MedRec alication on the single-rocessor server along with the modeled curve for a QNM solution using a single queue/server reresenting the bottleneck resource. The model was constructed and solved using SPE ED. As the figure indicates, the single queue aroximation is a good one for this system (because the Alication Server CPU dominates the demand).

Vertical Scalability To demonstrate the vertical scaling characteristics of this alication, throughut versus number of simulated clients was measured for -, 2-, 4-, and 8 rocessor alication server configurations. The alication server is a 75 MHz latform caable of holding u to 24 rocessors. Utilizations for the alication and database server CPUs as well as the database server disk were also measured. Table 4 shows the results of these measurements. Table 4: Measured Throughut for Vertical Scaling Number of Processors 2 4 8 Max TPS 43.96 78.74 42.5 26.35 Number of Clients at Max 4 4 2 4 TPS Aserver CPU Utilization 98% 97% 95% 9% DB Server CPU Utilization <%.3% 2.25% 3.9% DB Server Disk Utilization 6% 2% 9% 28.24% Note that the number of clients decreases at 4 rocessors. This is one of the questionable measurements that we found. You can see in Table 3 that the maximum transactions er second fluctuates between 4 and 8 clients, but remains close to 43 transactions er second. Amdahl s Law Regression analysis determines that Amdahl s Law rovides a good fit to this data with a σ of.88 (r 2 =.992). This value of s indicates that 8.8% of the workload must be erformed sequentially. The maximum number of rocessors that can be installed in this server is 24. Thus the maximum value of C() that can be obtained is: C( 24) = ------------------------------ = 24 + σ( ) ------------------------------------------------ = +.88( 24 ) 7.93 The maximum throughut with one rocessor, (), is 43.96 transactions er second so the maximum throughut with 24 rocessors would be: ( 24) = C( 24) 43.96 = 348 ts. SPE ED is a trademark of Performance Engineering Services Division, L&S Comuter Technology, Inc. 8 The limit on C() for Amdahl s Law is /σ. Thus, even with an infinite number of rocessors, the maximum value of C() that could be obtained is.4 for a maximum throughut of 5 transactions er second. Suer-Serial Model Regression analysis determines that the Suer-Serial Model also rovides a good fit to this data. This is nourrising since the Suer-Serial Model is an extension of Amdahl s Law with an extra term for interrocessor communication. The values of the suer-serial arameters obtained from a regression analysis are: σ =.787 and γ =.64 (r 2 =.993). This indicates that, according to the Suer-Serial Model, 7.9% of the work must be erformed sequentially. Aroximately.6% of that sequential work is used for interrocessor communication. The value of C() redicted by the Suer-Serial Model at 24 rocessors is: C( 24) = ----------------------------------------------------------------------------------------- 24 = +.787[ 23+.64 24 23] 6.82 which gives a maximum throughut of 299 transactions er second. Overall Evaluation Figure 8 summarizes the measured data as well as the maximum throughut versus number of rocessors redicted by both Amdahl s Law and the Suer-Serial Model. Xmax (ts) 4 35 3 25 2 5 5 5 5 2 25 Number of Processors Measured Amdahl's Law Suer-Serial Model Figure 8: Summary of Modeled and Measured Data As Figure 8 shows, there is little difference between Amdahl s Law and the Suer-Serial Model in the region covered by the measured data and both models rovide a reasonable fit. The values of r-squared from the regression analysis indicate a slightly better fit for the Suer-Serial Model (r 2 =.993) than Amdahl s Law (r 2 =.992).The difference is small, however, and the increase in r 2 may be due to the extra degree of freedom introduced by the additional arameter in the Suer-Serial Model rather than an imroved fit.

Without additional information, it is not ossible to say which model fits the data better. Measurements at higher numbers of rocessors would hel resolve the ambiguity. Knowledge of the software architecture could also hel select the most aroriate model. For examle, if we know that there is a shared state that must be maintained among the rocessors, the Suer- Serial model would be the most likely choice. In view of this, we consider the Amdahl s Law result of 348 transactions er second to be uer limit on the caacity that can be obtained by scaling this system vertically and the Suer-Serial result of 299 transactions er second to be a lower bound. Horizontal Scalability In this section, we exlore the horizontal scaling characteristics of the MedRec alication. In this study, each node contains four 4 MHz rocessors. Rather than adding more rocessors to a node, additional nodes are added to scale the system. Measurements of throughut versus number of simulated clients were made for -, 2-, 3-, and 4-node configurations. The database and network configurations were the same as those used for the vertical scaling study. Table 5 shows the results of these measurements. Gustafson s Law Regression analysis indicates that Gustafson s Law rovides an excellent fit to the horizontal scaling measurements with σ ()=.477 (r 2 =.9999). This indicates that aroximately 4.8% of the work with one rocessor is erformed sequentially. Since this amount of work is constant, as more rocessors are added, this fraction decreases. Linear Scalability The linear model also rovided an excellent fit to the horizontal scaling measurements (r 2 =.9997). The sloe of the regression line is.39. Table 6 shows the measured values of maximum throughut along with the values redicted by Gustafson s Law and the linear model. Table 6: Measured Versus Modeled Throughut Number of Nodes Measured Gustafson s Law Linear.49.49.49 2 28.66 25.89 2.98 3 33.6 3.36 3.47 4 48.76 46.86 4.96 Note that, as discussed earlier, for linear scalability the sloe must be one. Thus, a sloe of one was used to calculate the linear redictions in Table 6. Figure 9 shows a grahical comarison of the measured and modeled throughut for u to ten rocessors. Table 5: Measured Throughut for Horizontal Scaling Number of Nodes 2 3 4,2, Max TPS.49 28.66 33.6 48.76 Number of Clients at Max 4 4 4 TPS Aserver CPU Utilization 95.9% 95.58% 95.35% 95.22% DB Server CPU Utilization.7% 3.64% 6.2% 9.7% DB Server Disk Utilization 3.94% 27.79% 4.58% 53.25% Regression analysis shows that both Gustafson s Law and the linear model rovide good fits to the measured data. Regression analysis based on Amdahl s Law and the Suer-Serial model do not yield good fits to the exerimental data. In addition, both analyses give negative values for the serial arameters. Thus, Amdahl s Law and the Suer-Serial model are not aroriate for this data. Xmax (ts) 8 6 4 Measured Gustafson's Law 2 Linear 2 4 6 8 Number of Nodes (n ) Figure 9: Measured versus Modeled Data As Table 6 and Figure 9 indicate, the Gustafson s Law estimates more nearly match the measured values. However, an additional measurement at a higher number of rocessors would hel distinguish between these two models. Secondary Bottlenecks Both models redict that we can exect considerably more throughut from the horizontal scaling strategy than from the vertical. However, it is still imortant to be careful when extraolating. For examle, if our erformance requirement is, transactions er second, Gustafson s Law redicts that using ten nodes will give a maximum throughut of,5 transactions er second and the linear model 9

redicts a maximum throughut of,5 transactions er second. These rojections assume that no other bottleneck resource will limit our ability to achieve the required throughut with ten nodes, however. Looking at Table 5, we see that removing the alication server CPU bottleneck leaves the database server disk as the resource with the highest utilization. The demand at this resource is.32 seconds which agrees with the value of.38 obtained in the vertical scalability measurements to within exerimental error. This corresonds to a maximum throughut of 757 transactions er second. Therefore, to achieve the erformance requirement of, transactions er second, it will be necessary to either ugrade to faster disks or add a second disk to the database server. Of course, adding only ten nodes will mean that we are oerating at near one-hundred ercent utilization on the alication server CPU. This will result in unaccetably long resonse times, so additional nodes should be added to reduce the CPU utilization to a more reasonable number. Case Study Discussion The results of the analysis indicate a significant difference in the behavior of the system for vertical versus horizontal scaling. Since the alication software is the same in both cases, the difference must be due to the latforms. The most likely exlanation for this difference is the resence of the MP Effect when scaling vertically. The MP Effect is a loss of comuting caacity that occurs when adding rocessors to a single latform. This loss of caacity is due to additional overhead and/or contention between rocessors for shared system resources (e.g., the system bus) [Artis 99], [Gunther 996]. Gunther has shown that Amdahl s Law and the Suer-Serial Model aly to SMP scaling with suer-serial effects becoming more significant as the amount of interrocessor communication increases [Gunther 993]. It is temting to generalize this result and conclude that horizontal scaling is suerior to vertical scaling in all cases. However, the scalability of a system deends on the characteristics of both the alication and the execution environment. In this case, it aears that the software was structured into units that could execute indeendently in arallel. If a larger ercent of the workload were serial, both the vertical and horizontal scalability would have followed Amdahl s Law or the Suer-Serial model. In addition, the good horizontal scalability indicates that back-end database interactions were effectively arallelized so that there was little or no serial contention there either. This will not be true of all alications. In articular, for distributed alications that must maintain a common state (e.g., databases) there will be overhead to maintain coherence. In these cases, interrocessor communication will be a significant factor. For such alications, the enhanced communication efficiency of a bus versus a network may favor vertical scaling on a single latform. Each alication/latform combination should be evaluated individually. This case study also emhasizes the imortance of thoroughly understanding the system before committing to a scaling strategy. Secondary bottlenecks (such as the database disk here) are a fact of life. It is imortant to know what they are, how they imact the scalability of the system, and how costly they are to remove before undertaking exensive system ugrades. Finally, ihould be noted that the data ublished by BEA Systems, Inc. rovides a good examle of the tye of information that is useful for understanding and evaluating scalability. ECONOMICS OF SCALABILITY Scalability is an economic as well as a technical issue. In many cases there are alternative scaling strategies that will meet erformance requirements. The choice among them should then be based on cost. The imortance of cost in meeting erformance requirements is imlicit in the inclusion of cost figures in many benchmark reorts (see, e.g., [TPC]). These costs are rarely simle, one-time costs. Adding more hardware means that there will be costs for urchase or lease, software licenses, maintenance contracts, additional system administration, facilities, and so on. The timing of these exenditures is likely to be highly deendent on the scaling strategy. For examle, one strategy may require frequent, small ugrades while another may require fewer, more exensive ones. In order to make an unbiased comarison of the alternatives, it may be necessary to convert the costs to current dollars. Techniques for this are discussed in many laces, including [Reifer 22] and [Williams and Smith 23]. It is imortant to consider all costs associated with the scaling strategy. In some cases, hardware exenditures may be dwarfed by costs for suoroftware and middleware [Mohr 23]. Costs of ugrading are also subject to discontinuities as the amount of hardware is increased. For examle, aome oint adding a server may require hiring an additional system administrator or exanding the facility to accommodate the additional footrint.

Don t forget thacalability isn t just a hardware issue. It is often easy to increase scalability with a different (initial) software architecture [Williams and Smith 23]. In this case study, a design alternative that reduces the CPU consumtion of the alication could also imrove scalability. The modification would roduce a new version of the alication with new scalability characteristics. Note that the most cost-effective choice for meeting a given erformance requirement might not be the one with the highest overall scalability. SUMMARY AND CONCLUSIONS Scalability is one of the most imortant quality attributes of today s distributed software systems. Yet, desite its imortance, scalability in these alications is oorly understood. This aer has resented a model-based view of scalability in Web and other distributed alications that is aimed at removing some of the myth and mystery surrounding this imortanoftware quality. We use the relative caacity or scalability factor C ( ) = -------------------- as a measure of the scalability of a system. Scalability is classified according to the behavior of C() as: Linear the relative caacity is equal to the number of rocessors,, i.e., C() =. Sub-linear the relative caacity is less than, i.e., C() <. Suer-linear the relative caacity is greater than, i.e., C() >. A key consequence of using C() as the metric for scalability is that for linear scalability, the sloe of the line must be equal to one. This means that both sub-linear and suer-linear scalability must, in fact, be described by non-linear functions. While measurements amall numbers of rocessors may aear to be linear, measurements at higher numbers of rocessors will reveal the non-linearity. This also means that linear extraolations of near-linear results, such as that in our oening examle, can be misleading. Since the actual function is necessarily non-linear, these extraolations will overestimate the scalability of the system if the sloe of the extraolated line is less than one and underestimate it if the sloe is greater than one. This aer has reviewed four models of scalability that are alicable to Web and other distributed alications. These are summarized in Table 7. We have also demonstrated the alicability of these models to Web alications via a case study of a simle Web alication. Analysis of measured data for the case study system indicates that its vertical scalability is best described by either Amdahl s Law or the Suer- Serial Model while its horizontal scalability is best described by either Gustafson s Law or linear scalability. Linear Model Amdahl s Law Suer-Serial Model Gustafson s Law The case study also demonstrates thacalability is a system roerty. In this case, the same alication exhibits differencalability roerties when scaling vertically or horizontally. As this aer demonstrates, these models are alicable to Web and other distributed systems. However, due to the comlexity of such systems, it is likely that some systems will not conform to any of these models. It is therefore imortant to determine, via analysis of measured data, that a given system follows a known model before making decisions based on redicted scalability. Scalability is also affected by software resource constraints such as a One Lane Bridge Performance Antiattern [Smith and Williams 22]. This aer addressed only hardware bottlenecks. This aer also considered only a single dominant workload. These techniques can be adated to cover these other asects of the roblem. REFERENCES Table 7: Scalability Models C S C A C() C L = = ------------------------------ + σ( ) = ----------------------------------------------------------------- + σ[ ( ) + γ ( ) ] C G [Alba 22] E. Alba, Parallel Evolutionary Algorithms Can Achieve Suer-Linear Performance, Information Processing Letters, vol. 82,. 7-3, 22. 2 = --------------------------------------- + σ ( )

[Amdahl 967] G. M. Amdahl, Validity of the Single- Processor Aroach To Achieving Large Scale Comuting Caabilities, Proceedings of AFIPS, Atlantic City, NJ, AFIPS Press, Aril, 967,. 483-485. [Artis 99] H. P. Artis, Quantifying MultiProcessor Overheads, Proceedings of CMG '9, December, 99,. 363-365. [BEA 23a] BEA Systems, Inc., Avitek Medical Records. Architecture Guide, htt:// edocs.bea.com/wls/docs8/medrec_arch/ index.html. Journal of Parallel and Distributed Comuting, vol. 9, no.,. 27-37, 993. [TPC] Transaction Processing Council, www.tc.org. [Williams and Smith 23] L. G. Williams and C. U. Smith, Making the Business Case for Software Performance Engineering, Proceedings of CMG, Dallas, December, 23. [Williams and Smith 22] L. G. Williams and C. U. Smith, PASA SM : An Architectural Aroach to Fixing Software Problems, Proc. CMG, Reno, December, 22 [BEA 23b] BEA Systems, Inc. BEA WebLogic Server: Caacity Planning, htt://edocs.bea.com/wls/docs8/calan/. [Brown 23] R. G. Brown, Engineering a Beowulfstyle Comute Cluster, Duke University, 23, www.hy.duke.edu/resources/comuting/brahma/ beowulf_book/. [Gunther 2] N. J. Gunther, The Practical Performance Analyst, iuniverse.com, 2. [Gunther 996] N. J. Gunther, Understanding the MP Effect: Multirocessing in Pictures, Proceedings of CMG '96, December, 996. [Gunther 993] N. J. Gunther, A Simle Caacity Model for Massively Parallel Transaction Systems, Proceedings of CMG 93, San Diego, December, 993,. 35-44. [Gustafson 988] J. L. Gustafson, Reevaluating Amdahl's Law, Communications of the ACM, vol. 3, no. 5,. 532-533, 988. [Jain 99] R. Jain, The Art of Comuter Systems Performance Analysis: Techniques for Exerimental Design, Measurement, Simulation, and Modeling, New York, NY, John Wiley, 99. [Mohr 23] J. Mohr, SPE on IRS Business Systems Modernization, Panel: The Economics of SPE, CMG, Dallas, 23. [Reifer 22] D. J. Reifer, Making the Software Business Case: Imrovement by the Numbers, Boston, Addison-Wesley, 22. [Smith and Williams 22] C. U. Smith and L. G. Williams, Performance Solutions: A Practical Guide to Creating Resonsive, Scalable Software, Boston, MA, Addison-Wesley, 22. [Sun and Ni 993] X.-H. H. Sun and L. M. Ni, Scalable Problems and Memory-Bounded Seedu, 2