Load Balancing and Almost Symmetries for RAMBO Quorum Hosting



Similar documents
Locality Based Protocol for MultiWriter Replication systems

Clustering and scheduling maintenance tasks over time

A Comparison of General Approaches to Multiprocessor Scheduling

Fairness in Routing and Load Balancing

Scalable Load Balancing in Nurse to Patient Assignment Problems

Applied Algorithm Design Lecture 5

A Tool for Generating Partition Schedules of Multiprocessor Systems

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

A CP Scheduler for High-Performance Computers

Distributed Data Stores

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Distributed Data Management

Load Balancing Mechanisms in Data Center Networks

Load Balancing on a Grid Using Data Characteristics

A Mobility Tolerant Cluster Management Protocol with Dynamic Surrogate Cluster-heads for A Large Ad Hoc Network

Cost Models for Vehicle Routing Problems Stanford Boulevard, Suite 260 R. H. Smith School of Business

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

Compact Representations and Approximations for Compuation in Games

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

Integrating Benders decomposition within Constraint Programming

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Single-Link Failure Detection in All-Optical Networks Using Monitoring Cycles and Paths

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Performance of networks containing both MaxNet and SumNet links

A hybrid approach for solving real-world nurse rostering problems

On the Placement of Management and Control Functionality in Software Defined Networks

Approximation Algorithms

Heuristics for Dynamically Adapting Constraint Propagation in Constraint Programming

A Constraint Programming based Column Generation Approach to Nurse Rostering Problems

Linear Codes. Chapter Basics

LOAD BALANCING AND EFFICIENT CLUSTERING FOR IMPROVING NETWORK PERFORMANCE IN AD-HOC NETWORKS

AN EFFICIENT DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CONTENT DELIVERY NETWORKS

OpenFlow Based Load Balancing

Proxy-Assisted Periodic Broadcast for Video Streaming with Multiple Servers

Data Management in the Cloud

SIMULATION OF LOAD BALANCING ALGORITHMS: A Comparative Study

Traffic Monitoring in a Switched Environment

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster

Reconfigurable Distributed Storage for Dynamic Networks

Multi-layer MPLS Network Design: the Impact of Statistical Multiplexing

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Mobile Security Wireless Mesh Network Security. Sascha Alexander Jopen

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Adaptive Online Gradient Descent

Distributed File Systems

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

npsolver A SAT Based Solver for Optimization Problems

Solution of Linear Systems

OPRE 6201 : 2. Simplex Method

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Online Scheduling with Bounded Migration

Dynamic Load Balancing in a Network of Workstations

Traffic Engineering for Multiple Spanning Tree Protocol in Large Data Centers

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

TD 271 Rev.1 (PLEN/15)

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Balancing in Structured Peer to Peer Systems

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Load Balancing in Structured Peer to Peer Systems

Chapter 14: Distributed Operating Systems

PEER-TO-PEER (P2P) systems have emerged as an appealing

Private Record Linkage with Bloom Filters

7 Gaussian Elimination and LU Factorization

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

RECOMMENDATION ITU-R F (Question ITU-R 157/9) b) that systems using this mode of propagation are already in service for burst data transmission,

A very short history of networking

APPLICATION OF ADVANCED SEARCH- METHODS FOR AUTOMOTIVE DATA-BUS SYSTEM SIGNAL INTEGRITY OPTIMIZATION

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES

Optimizing Shared Resource Contention in HPC Clusters

Scheduling Allowance Adaptability in Load Balancing technique for Distributed Systems

Exploratory data analysis (Chapter 2) Fall 2011

Operation Count; Numerical Linear Algebra

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

Transportation Polytopes: a Twenty year Update

Optimal Gateway Selection in Multi-domain Wireless Networks: A Potential Game Perspective

6.852: Distributed Algorithms Fall, Class 2

CURTAIL THE EXPENDITURE OF BIG DATA PROCESSING USING MIXED INTEGER NON-LINEAR PROGRAMMING

Analysis of Approximation Algorithms for k-set Cover using Factor-Revealing Linear Programs

Towards Optimal Firewall Rule Ordering Utilizing Directed Acyclical Graphs

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

Chapter 16: Distributed Operating Systems

DATA ANALYSIS II. Matrix Algorithms

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study

Transcription:

Load Balancing and Almost Symmetries for RAMBO Quorum Hosting Laurent Michel, Alexander. A. Shvartsman, Elaine Sonderegger, and Pascal Van Hentenryck 2 University of Connecticut, Storrs, CT 06269-255 2 Brown University, Box 90, Providence, RI 0292 Abstract. Rambo is the Reconfigurable Atomic Memory for Basic Objects, a formally specified algorithm that implements atomic read/write shared memory in dynamic networks, where the participating hosts may join, leave, or fail. Rambo is particularly suited for volatile environments such as mobile networks. To maintain availability and consistency in such dynamic settings, Rambo replicates objects and uses quorum systems that can be reconfigured in response to perturbations in the environment. This is accomplished by installing new quorum configurations and removing obsolete configurations, while preserving data consistency. Given the dynamic nature of the atomic memory service, it is vitally important to reconfigure the system online, while making wellreasoned decision about how to deploy new quorum configurations. This paper reexamines the quorum configuration problem, concentrating on better load balancing models and a novel use of almost symmetries for breaking similarities among hosts in the target network. The resultant performance improvements allow more reasonably-sized systems to be reconfigured online in a way that optimizes deployment of quorums with respect to relevant performance criteria. Introduction Providing consistent shared objects in dynamic networked systems is one of the fundamental problems in distributed computing. Shared object systems must be resilient to failures and guarantee consistency despite the dynamically changing collections of hosts that maintain object replicas. Rambo, which stands for Reconfigurable Atomic Memory for Basic Objects [6, 4], is a formally specified distributed algorithm designed to support a long-lived atomic read/write memory service in such a rapidly changing network environment. To maintain availability and consistency of the memory service, Rambo uses reconfigurable quorum systems, where each object is replicated at hosts that are quorum members, and where the intersection among quorum sets is used to guarantee atomicity. The ability to rapidly reconfigure quorum systems in response to failures and delays is at the heart of the Rambo service. Any participant may request a new configuration, after which consensus is used to agree on the new configuration to be installed. While the Rambo service permits any new quorum configuration

to be installed on-the-fly, it is important to install configurations that benefit system performance. This must be done quickly, since a lengthy quorum selection and deployment process may impact the liveness and fault-tolerance of the system, as hosts may continue to fail and their observed performance characteristics change over time. Ideally, a participant ought to be able, based on historical observations, to propose a well-designed quorum configuration that is optimized with respect to relevant criteria, such as being composed of members who have been communicating with low latency, and consisting of quorums that will be well-balanced with respect to read and write operation loads. The models in [7] focused on determining optimal new quorum configurations. Both constraint programming and local search techniques were used to demonstrate the feasibility of finding high-quality configurations that positively affect the performances of read and write operations and the liveness of the system. The CP model and the hybrid CP/CBLS model were implemented with Comet [4, 5, 8]. This work began by studying the optimal quorum hosting solutions found in [7] to better understand their properties. Patterns emerged for the properties of a good quorum hosting and the relationships between quorum systems and their hosting network topologies. The insights led to a significantly better constraint programming model for the Rambo quorum hosting problem. While [7] offered a solution to an open problem, the benchmarks were based on modestly sized quorum systems, and it was debatable whether the approach would scale. The contributions of this paper address these questions. First, the paper revisits the way load-balancing is modeled to deliver a more realistic and robust model. Second, the paper offers a decomposition-based model which separately optimizes the replica deployment w.r.t. the induced delays and the quorum selection w.r.t. the balancing objective. Third, the paper expands on the symmetry-breaking techniques found in [7] and exploits a dominance relation also known as an almost symmetry [5, ]. The dominance is handled with a dynamic symmetry breaking embedded in the search of the second phase of the decomposition. Finally, the paper presents experimental results demonstrating that the extensions are transformative, bringing orders of magnitude in performance improvements (up to 0,000 times faster) and addressing the scalability issues for network topologies endowed with symmetrical structures. Although this work is presented in the context of Rambo, the results can be applied to any distributed service that relies on dynamically introduced quorum systems. The rest of this paper is organized as follows. Section 2 presents Rambo and quorums in more detail. Section 3 introduces a high-level model for the quorum hosting problem, and Section 4 presents the CP models. Section 5 reports the experimental results, and Section 6 concludes. 2 RAMBO and Quorums Rambo, like most data replication services for distributed systems, uses quorums to ensure the atomicity of its data in the presence of failures. Quorum systems

are collections of sets of quorum members, where each pair of quorums has a nonempty intersection [3, 3]. If each operation contacts a complete quorum of hosts containing data replicas, then any operation is guaranteed to see the results of the most recently completed operation because there must be at least one host in the intersection of the two respective quorums that participates in both operations. Rambo uses a variant, called read/write quorum systems, where such a system has two sets of quorums, read quorums and write quorums, with each read quorum having a non-empty intersection with every write quorum. Figure shows three read/write quorum systems. For each system the horizontal ovals are the read quorums, and the more vertical shapes are the write quorums. Wheel 3x3 4x4 Fig.. Examples of Read/Write Quorum Systems The read and write operations in Rambo use a two-phase strategy. The first phase gathers information from at least one read quorum of all active configurations, and the second phase propagates information to at least one write quorum of each active configuration. Quorum hosting is an assignment of quorum members to hosts. What sets Rambo apart from other data replication services is its ability to dynamically reconfigure the quorum system as hosts join, leave, and fail. Quorum reconfiguration is performed concurrently with read and write operations. As long as the members of at least one read quorum and one write quorum of the active configurations are still functioning, reconfiguration can take place. Thus, speed of reconfiguration is paramount as failures and changes in participants are detected. The selection of a new configuration should be made dynamically in response to external stimuli and observations about the performance of the service. Rambo participants have no knowledge of the underlying network, particularly as nodes join and leave. Hence, each host measures (externally to Rambo) its average message delays with every other host as the best available estimate of network connections and host failures. Each host also measures the average frequency of its read and write operations. The gathered information is shared with the other hosts by piggy-backing this information onto routine messages of Rambo. The overall guiding principle is that observations of current behaviors are the best available predictors of future behaviors. Given an abstract specification of a quorum system, the quorum hosting problem is to assign each quorum member to a participating host in such a way that the total delays for hosts to contact read and write quorums are minimized. Once an assignment is computed, the resulting configuration is augmented to include information recommending the best read and write quorums for each

host to use, where use of the best quorums will result in minimum delays for read and write operations and the most balanced load on replica hosts (until failures occur in those best quorums). Of course, the use of this information is optional, and a host can always propose another quorum system if it does not observe good responses from the recommended system. 3 Modeling RAMBO Quorum-Configuration Selection Model Parameters. The inputs for the Rambo quorum configuration model are: The set of hosts H. For every host h H, the average frequency f h of its read and write requests. For every pair of hosts h, h 2 H, the average round trip delay d h,h 2 of messages from h to h 2. The abstract configuration c to be deployed on H, where c consists of: The set of members M, each of which maintains a replica of the data. The set of read quorums R P(M). The set of write quorums W P(M). Decision Variables. A decision variable x m with domain H is associated with each configuration member m. x m = h when replica m is deployed on host h. Each host h also is associated with two decision variables readq h and writeq h (with domains R and W ) denoting, respectively, one read (write) quorum from the minimum average delay read(write)-quorums associated with h. Finally, three auxiliary variables readload m, writeload m, and load m represent the read, write, and total loads of a configuration member m that are induced by the traffic between the hosts and their chosen read/write quorums. The Objective. An optimal deployment minimizes ( ( ) ( )) f h min max d h,x m + min max d h,x m q R m q q W m q h H where each term in the summation captures the time it takes in Rambo for a host h to execute the read/write phase of the protocol. Indeed, a read (or a write) in Rambo requires the client to contact all the members of at least one read quorum before it can proceed to a write round of communication with all the members of at least one write quorum to update the data item. The max m q d h,xm reflects the time it takes to contact all the members of quorum q as one must wait for an answer from its slowest member. The outer min q R reflects the fact that Rambo must hear back from one quorum before it proceeds, and this happens when the fastest quorum replies. The Constraints. A configuration is subject to the following constraints. First, all configuration members must be deployed on separate hosts. m, m M : m m x m x m

An implementation of Rambo may use different strategies when contacting the read and write quorums. A conforming implementation might simply contact all the read quorums in parallel. Naturally, this does not affect the value of the objective, but it induces more traffic and work on the members of the quorum system. Another strategy for Rambo is to contact what it currently perceives as the fastest read quorum first and fall back on the other read quorums if it does not receive a timely response. It could even select a read quorum uniformly at random. These strategies strike different tradeoffs between the traffic they induce and the workload uniformity. The model presented below captures the greedy strategy, namely, Rambo contacts its closest quorum first, and the model assumes that this quorum replies (unless a failure has occurred). The model uses the readq h and writeq h variables of host h to capture which read (write) quorum Rambo contacts. Recall that a read (write) quorum is a set and therefore the domain of readq h is the set of read quorums (similarly for writeq h ). More formally, ( ) readq h = r max d h,x m = min max d h,x m m r q R m q writeq h = w ( ) max d h,x m = min max d h,x m m w q W m q Each equation requires the chosen quorum to induce a minimal delay. Note that several quorums might deliver the same minimal delay, so this constraint alone does not determine a host s ideal read (write) quorum. The third set of constraints defines the read, write, and total loads of a configuration member as readload m = f h h H m readq h writeload m = h H m writeq h f h load m = readload m + writeload m Clearly, the loads on m depend on which read and write quorums are chosen among those that induce minimal delays. 3. Load Balancing For load balancing, the Rambo deployment model in [7] uses an additional input parameter α limiting the spread of loads by requiring the maximum load on a configuration member to be within a factor α of the minimum load. One insight, driving the development of the new load-balancing models, is that in order for a configuration member to be useful in a data replication system, not only must its total load be non-zero, but both its read and write loads must also be non-zero. If a configuration member is never used in a write quorum,

it can never have the most recent data value, and if a configuration member is never used in a read quorum, it does not matter whether or not it has the most recent data value. This pathological setup is easily avoided by requiring readload m > 0 and writeload m > 0. Carrying this reasoning further, it is inappropriate to allow the load on a configuration member to be predominately read requests or predominately write requests as is possible when only the total load on members is balanced. Instead, read and write loads should be balanced separately. The first model to consider uses the load factor α to separately constraint the read and write loads. Unfortunately, this is not a sensible approach since, for small values of α, many networks do not have satisfying quorum configurations. This is particularly true when a few hosts send the bulk of the messages. New Load-Balancing Model The adopted approach uses two optimizations, rather than a single optimization, to obtain a balanced hosting. The first optimization finds an assignment of configuration members to hosts that minimizes communication delays. The second, given a global optimum of the first, finds an assignment of quorums to hosts that minimizes the load imbalances among configuration members. Compared to [7], this approach trades off load-balancing among configuration members for faster quorum response times. The loads may be even more balanced for some networks with this approach, however, because the optimization does not stop with an assignment that satisfies the α load-balancing factor. Note that the first optimization only delivers one global optimum (when there might be several), and this specific solution might not lead to the most balanced solution in the second optimization. Two alternative load-balancing objectives are studied. The first minimizes the differences in the read and write loads between the most heavily loaded and the most lightly loaded configurations members (or alternatively, minimizes α). ( ) min max readload m min readload m m M m M ( ) min max writeload m min writeload m m M m M The second minimizes the standard deviations for the read and write loads. ( min M m M(readLoad m ) 2 ( ) readload m ) 2 m M ( min M m M(writeLoad m ) 2 ( ) writeload m ) 2 m M The second objective yields more middle-of-the-range loads for configuration members, but possibly a slightly larger range of values. Minimizing the standard deviation of loads was found to be an effective technique for balancing work loads among nurses [], but its added cost may not be justifiable for this application.

3.2 Network Symmetries Many network topologies have some symmetries among host nodes. Ideally, these symmetries can be exploited in determining optimal quorum placements. Consider, for example, the partial network illustrated in Figure 2. Hosts B, C, D, E, and F have a single neighboring host A. To the rest of the network beyond A, shown with three groups of, hosts B through F are equivalent because they all are the same number of hops away. The maximum delay for these unillustrated hosts to access a quorum consisting of hosts A and B, represented by the solid oval, is the delay to get a response from B. Similarly, the maximum delay to access a quorum of A and C, represented by the dashed oval, is the delay to get a response from host C. Since both B and C are one hop beyond A, these delays are approximately the same, and the two quorums are equivalent for the hosts beyond A. The two quorums also are equivalent for host A. The two quorums are not equivalent for hosts B and C, however. If B uses the quorum with hosts A and B, its maximum delay to access the quorum is the time it takes to get a response from A, whereas if B uses the quorum with hosts A and C, it also must wait to get a response from C which is another hop away. Thus, the quorum with hosts A and B, represented with the solid oval, is a better quorum for B to use. For C, the better quorum is the one with hosts A and C, represented with a dashed oval. An optimal quorum placement minimizes the system s total communication for accessing quorums, where each host s contribution to the total delay is its message frequency times the delays to contact its best read and write quorums. Assume D C E 4 5 7 B 0 A 3 F......... Fig. 2. Network Topology with an Almost Symmetry. B has a frequency of 0, and C has a frequency of 5. Then the overall objective would be less using the solid quorum with A and B, rather than the dashed quorum with A and C, because B s message frequency is greater than C s. The relationship among hosts B through F is an almost symmetry [5, ], rather than a true symmetry. The hosts are equivalent with respect to hosts outside the group. Within the group, the frequencies impose a dominance relation among hosts which requires special care in the search. Dominance in CP was studied in [0], while almost symmetries received some attention for planning[9] and graphs [2]. This realistic application demonstrates their true potential. 4 The CP Model for Quorum-Configuration Selection The initial Comet program for quorum-configuration selection is shown in Figure 3. The data declarations in lines 2 8 correspond to the input data of the Rambo model in Section 3. Line 9 declares an additional input used for breaking symmetries among the members of the quorum configuration. Lines 0 4

Solver<CP> cp(); 2 range M=...; // The members of the quorum configuration 3 set{int}[] R =...; // An array storing all the read quorums in the configuration 4 set{int}[] W =...; // An array storing all the write quorums in the configuration 5 range H =...; // The host nodes 6 int[] f =...; // The frequency matrix 7 int[,] d =...; // The delays matrix 8 int alpha =...; // The load factor 9 set{tuple{int low; int high}} Order =...; // The order of quorum members 0 int nbrq[m in M] =...; // The number of quorums for each member int degree[h] =...; // The degree of a host (number of neighbors) 2 range RQ = R.getRange(); range WQ = W.getRange(); 3 boolean readqc[rq,m] =...; 4 boolean writeqc[wq,m] =...; 5 var<cp>{int} x[m](cp,h); 6 var<cp>{int} readd[h,rq](cp,0..0000); 7 var<cp>{int} writed[h,wq](cp,0..0000); 8 var<cp>{int} readq[h](cp,rq); 9 var<cp>{int} writeq[h](cp,wq); 20 var<cp>{int} readload[m](cp,..0000); 2 var<cp>{int} writeload[m](cp,..0000); 22 var<cp>{int} load[m](cp,0..0000); 23 minimize <cp> 24 sum(h in H) f[h] (min(r in RQ) readd[h,r] + min(w in WQ) writed[h,w]) 25 subject to { 26 cp.post(alldifferent(x), ondomains); 27 forall(o in Order) cp.post(x[o.low] < x[o.high]); 28 forall(h in H,r in RQ) cp.post(readd[h,r] == max(m in R[r]) d[h,x[m]]); 29 forall(h in H,w in WQ) cp.post(writed[h,w] == max(m in W[w]) d[h,x[m]]); 30 forall(h in H) { 3 cp.post(readd[h,readq[h]] == min(r in RQ) readd[h,r]); 32 cp.post(writed[h,writeq[h]] == min(w in WQ) writed[h,w]); 33 } 34 forall(m in M) { 35 cp.post(readload[m] == sum(h in H) f[h] readqc[readq[h],m])); 36 cp.post(writeload[m] == sum(h in H) f[h] writeqc[writeq[h],m])); 37 cp.post(load[m] == readload[m] + writeload[m]); 38 } 39 cp.post(max(m in M) load[m] <= alpha min(m in M) load[m]); 40 } using { 4 while (sum(k in M) x[k].bound() < M.getSize()) 42 selectmax(m in M:!x[m].bound()) (nbrq[m]) 43 tryall<cp>(h in H : x[m].memberof (h)) by ( degree[h]) 44 cp.label(x[m], h); 45 onfailure cp.diff(x[m], h); 46 once<cp> forall (h in H :!readq[h].bound()!writeq[h].bound()) by ( f[h]) { 47 label(readq[h]); 48 label(writeq[h]); 49 } 50 } Fig. 3. The Initial CP Model in Comet

define derived data. Specifically, nbrq[m] is the number of quorums in which m appears, and degree[h] is the number of immediate neighbors of host h in the network, as determined from the observed message delays. RQ and W Q are the index sets of the read and write quorums, respectively. The auxiliary matrices readqc and writeqc are encodings of quorum membership, e.g., readqc[i, j] = true j R[i]. Lines 5 22 declare the decision variables. Variable x[m] specifies the host of configuration member m. Variables readd[h, r] and writed[h, w] are the communication delays for host h to access read quorum r and write quorum w. The variables readq[h] and writeq[h] represent the read and write quorum selections for host h. Finally, the variables readload[m] and writeload[m] represent the read and write communication loads on configuration member m, given the current deployment and quorum selections, and load[m] represents the total communication load on member m. Note that the domains for readload[m] and writeload[m] exclude zero. Line 24 specifies the objective function, which minimizes the total communication delay over all operations. Line 26 specifies the fault tolerance requirement, namely, all members of the configuration must be deployed to distinct hosts. The ondomains annotation indicates that arc-consistency must be enforced. Line 27 breaks the variable symmetries among the configuration members [2]. Lines 28 39 constraint the auxiliary delay variables and quorum selection variables needed in the load-balancing constraint. The constraints on lines 28 and 29 capture the delays incurred by host h to use a read (write) quorum. Lines 3 and 32 require the quorums assigned to host h, namely readq[h] and writeq[h], to be among the quorums with minimum delay for that host. Lines 35 37 specify the read, write, and total communication loads on m as the sum of the operation frequencies of each host for which m is a member of its assigned read and/or write quorum. Line 39 is the load-balancing constraint requiring the load on the most heavily loaded configuration member to be no more than α times the load on the most lightly loaded configuration member. The search procedure operates in two phases. The first phase (lines 4 45) assigns configuration members to hosts. The variable selection heuristic first focuses on variables that appear in many quorums, and the value selection heuristic first considers hosts that have many neighbors close by as these would be ideal locations for quorum members. The second phase (lines 46 49) finds an assignment of hosts to read and write quorums that satisfies the load-balancing constraint. This second phase cannot impact the value of the objective function. Rather, its role is to decide which quorum each host should use to meet the load-balancing requirement. Clearly, only one such assignment is needed which explains the once<cp> annotation on line 46. Lines 46 49 consider the most talkative hosts first (by decreasing frequencies) and attempt to assign one of the remaining legal (minimal delay) quorums from its domain. Improvements to the Search Heuristic. Within the search, the variable selection heuristic focuses on variables that appear in many quorums. This works well for quorum systems such as Wheel in Figure where some members are in

more quorums than others, but it doesn t work well for systems such as 3x3 where every member is in exactly two quorums. A better search heuristic for 3x3 exploits the symmetries in the quorum system by focusing on variables with smaller domains. The following search heuristic, which replaces line 42 in Figure 3, combines both goals. selectmin(m in M:!x[m].bound()) (x[m].getsize() 4 nbrq[m]) The factor of 4 is somewhat arbitrary, but its intent is to give more weight to small differences in the number of quorums to which a variable belongs. Load-Balancing Improvements. Because the read and write loads are balanced separately, the variables readload[m] and writeload[m] are independent of each other. Thus, the search can be improved by replacing lines 46 49 in Figure 3 with once<cp> forall (h in H :!readq[h].bound() ) by ( f[h]) 2 label(readq[h]); 3 once<cp> forall (h in H :!writeq[h].bound()) by ( f[h]) 4 label(writeq[h]); By dividing the single once<cp> block into two, the need to backtrack over a satisfactory assignment of read quorums to hosts while searching for a satisfactory assignment of write quorums to hosts is eliminated. To optimize the load balances, the load-balancing constraint on line 39 is deleted, and the optimization code is appended after line 50. Figure 4 contains the code for optimizing the load balance for read operations; a similar fragment optimizes the load balance for write operations with writeload[m]. Solver<CP> cp2(); 2 int bestr = max(m in M) readload[m] min(m in M) readload[m]; 3 var<cp>{int} objr(cp2,0..bestr); 4 var<cp>{int} readq2[h](cp2,rq); 5 var<cp>{int} readload2[m](cp2,..0000); 6 minimize <cp2> objr 7 subject to { 8 cp2.post(objr == max(m in M) readload2[m] min(m in M) readload2[m]); 9 forall(h in H) 0 cp2.post(readd[h,readq2[h]] == min(q in RQ) readd[h,q]); forall(m in M) 2 cp2.post(readload2[m] == sum(h in H)(f[h] readqc[readq2[h],m])); 3 } using { 4 forall(h in H :!readq2[h].bound()) by ( f[h]) 5 label(readq2[h]); 6 } Fig. 4. Solver for Optimally Balancing Read Loads

Figure 4 begins with the definition of a new solver for read-load balancing. Line 2 defines bestr as the difference between the heaviest and lightest read loads using the quorum hosting assignments from the communication optimization. New decision variables are declared in lines 3 5. Variable objr is the objective to be minimized and starts off with an upper bound equal to bestr. Variables readq2[h] and readload2[m] correspond to the variables readq[h] and readload[m] from Figure 3. The objective function, specified in lines 6 8, minimizes objr, the difference between the heaviest and lightest read loads using the load-balancing quorumto-host assignments. The constraints on lines 0 and 2 define readq2[h] to be one of the read quorums with the fastest response for host h and readload2[m] to be the resultant load on configuration member m. These constraints mirror the constraints on lines 3 and 35 of Figure 3. Finally, lines 4 and 5 assign read quorums to hosts, beginning with the most talkative hosts. To minimize the standard deviation of loads, line 2 of Figure 4 is replaced by int bestr = M.getSize() sum(m in M)(readLoad[m] ˆ 2) 2 (sum(m in M) readload[m]) ˆ 2; and line 8 is replaced with cp2.post(objr == M.getSize() sum(m in M)(readLoad2[m] ˆ 2) 2 (sum(m in M) readload2[m]) ˆ 2); Note that the spread global constraint cannot be used since the total load over all configuration members depends on the quorums selected for each host. Breaking Almost Symmetries. Figure 5 shows the changes to break the almost symmetries in the topology, where lines 6 6 replace lines 42 45 in Figure 3. set{set{int}} Eq =...; // The sets of equivalent hosts 2 set{int} noteq =...; // The non equivalent hosts 3 int minqsz[m in M] =...; // The smallest quorum size for each member 4 5 while (sum(k in M) x[k].bound() < M.getSize()) 6 selectmin(m in M:!x[m].bound())(minQSz[m], x[m].getsize() 4 nbrq[m]) { 7 set{int} boundh = collect(m in M : x[m].bound()) x[m]; 8 set{int} oneofh = union(e in Eq : card(e \ boundh)>0) 9 argmax(en in e \ boundh) frequency[en]; 0 set{int} searchh = noteq union boundh union oneofh; tryall<cp>(h in searchh : x[m].memberof (h)) by ( degree[h]) 2 cp.label(x[m], h); 3 onfailure cp.diff(x[m], h); 4 } 5 } Fig. 5. Search Procedure with Almost Symmetry Breaking

Lines 3 contain additional parameter declarations. Eq is the set of sets of equivalent hosts used in the value symmetry breaking. The sets of immediate neighbors for each host are computed first using the observed message delays, and then two hosts are deemed equivalent for configuration member placement if they have the same set of neighbors. The set noteq contains all the hosts not in some set of Eq. For each configuration member m, the variable minqsz[m] is the size of smallest quorum of which m is a member. Two sets, boundh and searchh, guide the search. boundh, on line 7, is the set of hosts currently bound to some configuration member, and searchh, on lines 8 0, is the subset of hosts currently being considered for assignment to configuration members. searchh contains all the bound hosts, all the hosts not equivalent to any other host, and one additional unbound host from each equivalence set, if one exists. Because the host symmetries are almost symmetries rather than true symmetries, care must be taken both in the order in which hosts are added to searchh and in the order in which configuration members are selected for binding. Clearly, hosts should be added to searchh beginning with the hosts with the highest message frequencies, as on line 9. The more subtle requirement is that configuration members must be assigned in increasing quorum size order, as specified on line 6. To minimize the communication delays for the most talkative hosts, those hosts should be only one hop away from the members of their best read and write quorums, if possible. A host is zero hops from itself, but two hops from the other hosts in its equivalence set. For small quorums, this difference in hops is more likely to impact the delay to contact a complete quorum. Hence, the members of small quorums must be assigned first. 5 Experimental Results The Benchmarks. The benchmarks represent characteristics of common network configurations and quorum systems. As illustrated in Figure 6, Stars3, Stars2, and Stars2c3 arrange 5 hosts in clusters, and Switch consists of 0 hosts on a switch and 4 other hosts hooked up via point-to-point links. Line arranges 5 hosts in a single line, with 4 hops between end hosts. Hyper6 interconnects 6 hosts in a hypercube. Three larger benchmarks, BigStars3, BigStars2, and 5 3 2 0 7 5 0 2 3 7 5 5 2 0 2 6 0 Stars2 4 8 2 0 30 66 80 30 30 4 2 5 5 4 Switch 8 6 Stars3 3 7 0 2 Stars2c3 0 6 8 30 Switch Fig. 6. Network Configuration Benchmarks Stars3, Stars2, Stars2c3, and Switch

BigStars2c3, are 2-host extensions of Stars3, Stars2, and Stars2c3, with the extra hosts added to the stars. Figure 6 shows the frequencies of the read/write operations for each host; the delays are the number of hops between hosts. The quorum system benchmarks are Wheel, 3x3, and 4x4, as illustrated in Figure, and Maj, consisting of six members grouped into four read quorums and four write quorums, each of which is a majority quorum [3] of four members. Model Comparisons. Table compares results for different quorum hosting models using Comet 2. on a Core 2 at 2.4 GHz. Columns are grouped by model type; the first column uses the model from [7], the second column adds non-zero load constraints and separate load optimization, the third column is the full load balancing model, and the fourth column adds almost-symmetry breaking. Within each group, column Opt gives the objective for the optimal solution found, T end gives the time in seconds to prove optimality, and column T opt reports the time in seconds to find the optimum. The table provides two rows for each benchmark: the first reports the average and the second reports the standard deviation over 50 runs. The 3x3 quorum system is used throughout. Original Model Separate Load Independent Symmetry From [7] Optimization Read & Write Breaking Benchmark Opt T end T opt Opt T end T opt Opt T end T opt Opt T end T opt Stars3 µ 284 39.83 2.52 284 43.77 0.30 284 24.22. 284 2.04 0.5 σ 6.79 2.32 8.47 0.90.22.05 0.08 0. Stars2 µ 36 527.50 63.88 287 6.2 0.4 287 06.9 9.82 287 0.09 0.04 σ 375.07 366.84 5.37 2.09 5.45 5.5 0.0 0.0 Stars2c3 µ 268 27.47 800.50 240 29.0 5.48 240 4.20 2.32 240 0.09 0.03 σ 460.78 648.67 4.9 4.90.05.96 0.0 0.0 Switch µ 620 47.57 32.53 620 34.87 9.54 620 24.4 20.42 620 0.79 0.49 σ 54.3 54.2 28.98 28.60 2.42 2.27 0.45 0.48 Line µ 57 388.4 269.3 499 56.04 7.66 499 202. 66.64 499 204.25 73.92 σ 05.92 56.36 24.2 38.74 8.08 24.9 4.69 26.96 Hyper6 µ 249 438.84 25.74 249 42.40 22.98 249 270.46 84.4 249 275.96 82.28 σ 54.53 4.2 53.50 5.30 6.86 48.55 3.0 5.96 Table. Experimental Results Comparing Models with 3x3 Quorum System It is useful to review these results in more detail.. For Stars2c3 the symmetry model is over 0,000 times faster than [7]. 2. Exploiting almost symmetries in network topologies significantly impacts performance, particularly for Stars2 and Stars2c3 that have many symmetric hosts. Note neither Line nor Hyper6 have network symmetries. 3. Both the non-zero load constraints and the new load balancing optimization potentially change the objective. 4. Hyper6 has more balanced loads with the new models; for the other benchmarks the loads often are less balanced but reflect an improved tradeoff between response times and load balancing.

5. With the new models, an optimal solution is found more quickly and the standard deviations for T Opt and T end are smaller. The one exception is Switch whose runs exhibit three widely different behaviors; the properties of Switch that cause these variations in behavior need further study. 6. Line performs poorly with the new search heuristic. Load Balancing Options. Table 2 compares two load balancing options: minimizing the difference between the largest and smallest loads and minimizing the standard deviation among loads. Results are presented for the symmetry breaking model with 3x3 and Maj quorum systems. Within each column group, #c is the number of choices during load balancing, not the total number of choices. 3x3 Maj Min. Difference Min. St. Dev. Min. Difference Min. St. Dev. Benchmark T end T opt #c T end T opt #c T end T opt #c T end T opt #c Stars3 µ 2.04 0.5 5.0 2.04 0.7 9.0.58 0.03 33..56 0.03 93.3 σ 0.08 0. 0.0 0.08 0. 0.0 0.04 0.00 9.9 0.04 0.00 25.0 Stars2 µ 0.09 0.04 0.0 0.09 0.04 0.0 0.30 0.03 642..2 0.03 093.8 σ 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.00 263.3 0.28 0.00 2860. Stars2c3 µ 0.09 0.03 6.0 0.09 0.03 5.5 0.6 0.02 4.9 0.6 0.02 48.8 σ 0.0 0.00 0.0 0.0 0.00 0.5 0.0 0.00 8.4 0.0 0.00 0.2 Switch µ 0.79 0.49 360.5 0.80 0.4 303.8 0.6 0.02 663.4.9 0.02 3237.3 σ 0.45 0.48 69.2 0.40 0.48 495.9 0.06 0.00 559..9 0.00 46.8 Line µ 204.25 73.92 2.0 209.32 76.53 3.0 2.02 3.89 4.8 20.63 3.60 0.9 σ 4.69 26.96 0.0 5.09 27.5 0.0 0.86.05 3.5.6.06 3.8 Hyper6 µ 275.96 82.28 34.5 267.70 83.45 4.0 54.33 2.68 80.0 60.72 3.03 2226.5 σ 3.0 5.96.5 7.23 49.63 0.0 3.6 0.39 3.0 3.37 0.47 205.6 Table 2. Experimental Results Comparing Load Balancing Models When adopting the standard deviation model, the additional cost is minimal for 3x3, but more significant for some of the Maj benchmarks. Stars2 with Maj and without symmetry breaking is the worst, requiring 685 choices for minimizing differences and 03,64 choices for minimizing standard deviations. Variations in T opt between the two options result from randomness in the deployment heuristic rather than the type of load balancing. Larger Instances. Table 3 shows the symmetry breaking model extends nicely to larger networks and quorum systems. BigStars3 BigStars2 BigStars2c3 Opt T end T opt Opt T end T opt Opt T end T opt µ 436 42.0 3.36 442.49 0.36 350 0.70 0.8 σ.97 9.77 0.67 0.23 0.7 0.0 Table 3. Experimental Results with 4x4 Quorum System

6 Conclusions Quorums are a powerful concept for implementing consistency and availability in replicated distributed data services like Rambo. Careful study of optimal quorum configurations led to significant performance enhancements to the online quorum hosting problem, taking the problem from feasible to practical. In particular, better load balancing models and the exploitation of almost symmetries in network topologies were presented. While this paper focuses on finite-domain models, the improvements are likely to impact CBLS/hybrid models. References. A. F. Donaldson and P. Gregory. Almost-Symmetry in Search (SymNet Workshop Proceedings). Technical Report TR-2005-20, University of Glasgow, 2005. 2. M. Fox, D. Long, and J. Porteous. Discovering near symmetry in graphs. In Proceedings of AAAI, 2007. 3. D. K. Gifford. Weighted voting for replicated data. In Proceedings of the Seventh Symposium on Operating System Principles (SOSP), pages 50 62, 979. 4. S. Gilbert, N. A. Lynch, and A. A. Shvartsman. RAMBO II: Rapidly reconfigurable atomic memory for dynamic networks. In IEEE/IFIP International Conference on Dependable Systems and Networks, pages 259 268, 2003. 5. D. Long and M. Fox. Symmetries in planning problems. In SymCon 03 (CP Workshop), 2003. 6. N. Lynch and A. Shvartsman. RAMBO: A reconfigurable atomic memory service for dynamic networks. In Proceedings of the 6th International Symposium on Distributed Computing, pages 73 90, 2002. 7. L. Michel, M. Moraal, A. Shvartsman, E. Sonderegger, and P. Van Hentenryck. Online Selection of Quorum Systems for RAMBO Reconfiguration. In CP 2009: 5th International Conference on Principles and Practice of Constraint Programming. Springer, 2009. 8. L. Michel, A. See, and P. Van Hentenryck. Parallelizing constraint programs transparently. In Proceedings of the 3th International Conference on the Principles and Practice of Constraint Programming (CP-2007), Providence, RI, 2007. 9. J. Porteous, D. Long, and M. Fox. The identification and exploitation of almost symmetry in planning problems. In K. Brown, editor, Proceedings of the 23rd UK Planning and Scheduling SIG, 2004. 0. S. Prestwich and J. C. Beck. Exploiting dominance in three symmetric problems. In in: Fourth International Workshop on Symmetry and Constraint Satisfaction Problems, pages 63 70, 2004.. P. Schaus, P. Van Hentenryck, and J.-C. Régin. Scalable Load Balancing in Nurse to Patient Assignment Problems. In CPAIOR 2009: 6th International Conference on the Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pages 93 207. Springer, 2009. 2. B. M. Smith. Sets of symmetry breaking constraints. In SymCon, volume 5, 2005. 3. R. H. Thomas. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans. Database Syst., 4(2):80 209, 979. 4. P. Van Hentenryck and L. Michel. Constraint-Based Local Search. The MIT Press, Cambridge, Mass., 2005. 5. P. Van Hentenryck and L. Michel. Nondeterministic control for hybrid search. Constraints, (4):353 373, 2006.