Non-Blocking Steal-Half Work Queues

Size: px
Start display at page:

Download "Non-Blocking Steal-Half Work Queues"

Transcription

1 Non-Blocking Steal-Half Work Queues Danny Hendler Tel-Aviv University Nir Shavit Tel-Aviv University Abstract The non-blocking work-stealing algorithm of Arora et al. has been gaining popularity as the multiprocessor load balancing technology of choice in both Industry and Academia. At its core is an ingenious scheme for stealing a single item in a non-blocking manner from an array based deque. In recent years, several researchers have argued that stealing more than a single item at a time allows for increased stability, greater overall balance, and improved performance. This paper presents StealHalf, a new generalization of the Arora et al. algorithm, that allows processes, instead of stealing one, to steal up to half of the items in a given queue at a time. The new algorithm preserves the key properties of the Arora et al. algorithm: it is non-blocking, and it minimizes the number of CAS operations that the local process needs to perform. We provide analysis that proves that the new algorithm provides better load distribution: the expected load of any process throughout the execution is less than a constant away from the overall system average. 1 Introduction The work-stealing algorithm of Arora et al. [2] has been gaining popularity as the multiprocessor load-balancing technology of choice in both Industry and Academia [1, 2, 5, 7]. The scheme allows each process to maintain a local work queue, and steal an item from others if its queue becomes empty. At its core is an ingenious scheme for stealing an individual item in a non-blocking manner from a bounded size queue, minimizing the need for costly CAS (Compareand-Swap) synchronization operations when fetching items locally. Though stealing one item has been shown sufficient to optimize computation along the "critical path" to within a constant factor [2, 4], several authors have argued that the scheme can be improved by allowing multiple items to be stolen at a time [3, 8, 9, 12]. Unfortunately, the only implementation algorithm for stealing multiple items at a time, This work was supported in part by a grant from Sun Microsysterns. Contact author: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the fifll citation on the first page. To copy otherwise, or repubfish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODC 2002, July 21-24, 2002, Monterey, California, USA. Copyright 2002 ACM /02/ $5.00. due to Rudolph et al. [12], requires the local process owning the queue to use a strong synchronization operation (CAS or some other form of mutual exclusion) for every push or pop. Since the local process' operations are far more frequent, this makes the Rudolph et al. algorithm significantly less effective than that of Arora et al. This state of affairs leaves open the question of designing an algorithm that, like Arora et al., does not use costly synchronization operations in every step, yet allows stealing multiple items at a time, thus achieving the stability, balance, and improved performance of algorithms like Rudolph et al. 1.1 The New Algorithm This paper presents a new work-stealing algorithm, Steal- Half, a generalization of the Arora et al. algorithm, that allows processes to steal up to half of the items in a given queue at a time. Our StealHalf algorithm preserves the key properties of the Arora et al. algorithm: it is non-blocking and it minimizes the number of CAS operations that the local process needs to perform. We provide a performance analysis showing that our algorithm improves on Arora et al. by providing an overall system balance similar to the Rudolph et al. scheme: the expected load of any process throughout the execution is less than a constant away from the overall system average. In our algorithm, as in the Arora et al. algorithm, each process has a local work queue. This queue is actually a deque, allowing the local process to push and pop items from the bottom, while remote processes steal items from the top. The Arora et al. algorithm is based on a scheme that allows the local process, as long as there is more than one item in the deque, to modify the bottom counter without a CAS operation. Only when the top and bottom are a distance of 1 or less apart, so that processes potentially overlap on the single item that remains in the deque, does the process need to use a CAS operation to perform consensus. This consensus is necessary because the inherent uncertainty [6] regarding a single read or write operation to the top and bottom counters can affect whether the last item to be stolen is still there or not. This yields an algorithm where for any monotonic sequence of k pushes or k pops, 1 the number of 1A monotonic sequence is a sequence of local operations that monotonically either increases or decreases the bottom counter. We define the synchronization complexity in terms of monotonic sequences since one cannot make claims in non-monotonic situations where the execution pattern of the underlying application causes thrashing back and forth on a single location. 280

2 CAS operations is O(1). In order to steal more than one item at a time, say half of the number of items in a deque, one must overcome a much greater uncertainty. Since a stealing process might remove up to half of the items, there are now many values of the bottom counter, say, up to half of the difference between top and bottom, for which there is a potential overlap. Thus, eliminating the uncertainty regarding reading and writing the counter requires consensus to be performed for half the items in the deque, an unacceptable solution from a performance point of view, since it would yield an algorithm with O(k) synchronization complexity. The key to our new algorithm is the observation that one can limit the uncertainty so that the local process needs to perform a consensus operation (use a CAS), only when the number of remaining items in the deque (as indicated by the difference between top and bottom), is a power of two! The scheme works so that as the distance between the bottom and top counters changes, it checks and possibly updates a special half-point counter. The uncertainty regarding a single read or write exists just as in the Arora et al. algorithm, only now it happens with respect to this counter. What we are able to show is that missing the counter update can only affect locations beyond the next power-of-two at any given point. For any monotonic sequence of k pushes or k pops, our algorithm uses only O(logk) CAS operations, slightly worse than Arora et al., but exponentially better than the O(k) of the Rudolph et al. scheme. Like the Arora et al. scheme, our StealHalf algorithm is non-blocking: the slowdown of any process cannot prevent the progress of any other, allowing the system as a whole to make progress, independently of process speeds. Like Arora et al., the non-blocking property in our algorithm is not intended to guarantee fault-tolerance: the failure of a process can cause loss of items. Arora et al. claim that their empirical testing on various benchmarks shows that the non-blocking property contributes to performance of work stealing, especially in multiprogrammed systems [2]. In our algorithm, the price for unsuccessful steal operations is a redundant copying of multiple item-pointers, whereas in the Arora algorithm only a single pointer is copied redundantly. However in our algorithm, successful stems transfer multiple items at the cost of a single CAS, whereas in the Arora et al. scheme a CAS is necessary for each stolen item. 1.2 Performance Analysis Mitzenmacher [9] uses his differential equations approach [10] to analyze the behavior of work stealing algorithms in a dynamic setting, showing that in various situations stealing more than one item improves performance. Berenbrink et al. [3] have used a markov model to argue that a system that steals only one item at a time can slip into an instable state from which it cannot recover, allowing the number of items to grow indefinitely. This implies that no matter how much buffer space is allocated, at some point the system may overflow. Treating such overflow situations requires the use of costly overflow mechanisms [5]. Berenbrink et al. [3] further show that an Arora-like scheme that allows stealing, say, half of the items, will prevent that system from slipping into such an instable state. Rudolph et. al [12], and later Luling and Monien [8], prove that in a load balancing scheme that repeatedly balances the work evenly among random pairs of local work queues, the expected load of each process will vary only by a constant factor from the load of any other process and that the overall variance is small. The algorithm we present is generic, in the sense that one can change the steal initiation policy to implement a variety of schemes including those of [2, 3, 12]. We choose to analyze its behavior in detail under a steai-attempt policy similar to [12]: ever so often, every process p performs the following operation, which we call balancing initiation: p flips a biased coin to decide if it should attempt to balance its work-load by stealing multiple items, where the probability of attempting is inversely proportional to its load. If the decision is to balance, then p randomly selects another process and attempts to balance load with it. Our main Theorem states that our algorithm maintains properties similar to those of Rudulph et al. We provide it here since the proof provided by Rudolph et al. turns out to be incomplete, and also because there are significant differences between the algorithms. As an example, the Rudolph et al. algorithm is symmetric: a process can both steal-items-from and insertitems-to the deque of another process. In our algorithm, however, a process can only steal items. Assume A~ is a time period where no deque grows or shrinks by more than u items (through pushing or popping). Let Lp,t denote the number of items in process p's deque at time t; also, let At denote the average-load at time t. Then if all processes perform balancing initiation every A~ time, there exists a constant c~, not depending on the number of processes or the application, such that: 'v'p,t : E[Lp,t] < oluat The existence of such Au is a natural assumption in most systems, and can be easily maintained with slight protocol modifications in others. In contrast, it can easily be shown, that if one steals a single item at a time as in the Arora et al. scheme, even if steal-attempts are performed every A1, there are scenarios in which the system becomes instable. In summary, we provide the first algorithm for stealing multiple items using a single CAS operation, without requiring a CAS for every item pushed or popped locally. An outline of the algorithm is presented in Section 2; the analysis is presented in Section 3; finally, correctness claims are presented at Section 4. 2 The StealHalf algorithm As noted earlier, the key issue in designing an algorithm for stealing multiple items is the need to minimize the use of strong synchronization operations both when performing local operations and when stealing multiple items. A first attempt at an algorithm might be to simply let a process steal a range of items by repeatedly performing the original code of stealing a single item for every item in the group. This would make the algorithm a steal-many algorithm 2 but at a very high cost: to steal k items, the thief would have to perform k synchronization operations. The algorithm presented here, utilizes an extended deque data structure, a variant of the deque of [2] depicted in Figure 1, to achieve synchronization at a low cost. The extended deque differs from the deque of [2] in two ways: (1) it is implemented as a cyclic array, and (2) it contains a memberstructure, called steairange, which defines the range of items that can be stolen atomically by a thief-process. 2Obviously with this technique the items-group is not stolen atomically. 281

3 Extended Deque deque steal-range 1 tag J *. ' up to 2^i items top I can be stolen atomically last j.--w" * I' : ' there are at least 2"i bot * items outside the steal-range : the bottom of the deque is somewhere in this group of 2^0+1) items Figure 1: The extended deque The algorithm allows stealing about half the items of the victim in a single synchronization operation. It does this by always maintaining the invariant that the number of items in every process p's steairange is approximately half the number of total items in p's deque. Process p's steairange is updated by any process that succeeds in stealing items from p's deque, and also by p itself. To keep the number of synchronization operation performed by a local process p as low as possible, p updates its steairange when pushing or popping an item to/from its deque, only if at least one of the following events occur: The number of items in p's deque crosses a 2 i boundary, for some i E N; A successful-steal operation from p's deque has occurred since the last time p modified its own steal- Range. 2.1 Data structures Each process owns an extended deque structure, which it shares with all other processes. In this structure: deq is an array that stores pointers or handles to items, which are generated and consumed dynamically. It has DEQ_SIZE entries. The range of occupied entries in deq includes all the entries in the range: [top... bot)deq_size 3. As noted, it is implemented as a cyclic array. The steairange member-structure extends the age structure from [2]. It contains the following fields: tag - as in [2], a time stamp on the updating of the steairange structure. A tag is required in order to overcome the ABA problem inherent to the CAS primitive. 4 As in [2], the bounded tag algorithm of [11] can be used. top - as in [2], the pointer to the top-most item in the deque. ~ 3[a - " - b)m denotes the half-open entries-range, from entry a up-to entry b (including a but not b), modulus m. 4Assume a process p reads a value A from location m, and then performs a successful CAS operation, which modifies the contents of m. Process p's CAS should ideally succeed only if the contents of m was not modified since it read value A. The problem is, that if other processes modified m's value to value B and then back again to A in the interim - the CAS operation still succeeds. To overcome this problem, a tag field is added to the structure. 5Note, that because the extended deque is cyclic, the item pointed to by top is not necessarily the item stored at the top-most dequeentry. The item pointed at by top is invariably the first item of the stealrange. steailast - points to the last item to be stolen. The items that would be stolen next (unless the streal- Range structure would be modified by the local process before the next steal) are in the range [top... steailast]d~q_~lze 6. The steallast variable can assume the special value null, which indicates there are no items to steal. The stealrange structure is modified via CAS operations. The bot field points to the entry following the last entry containing an item. Since the extended deque is cyclic, bot may be smaller than top. If bot and top are equal, the extended deque is empty. In addition to the shared extendeddeque structure, each process has a static and local structure, called prevsteairange, of the same type as steairange. It is used by the local process to determine whether a steal has occurred since the last time the process modified the stealrange. 2.2 High-level extended-deque methods description We specify the algorithm in a generic manner, that allows "plugging-in", in a modular way, components that allow flexibility in regard to the conditions/policy that control when a steal-attempt is initiated Balancing initiation code The code that appears in figure 2 should be performed by every process periodically, throughout the computation. IF (shouldbalanceo) Process *victim=randomprocesso; TryToSteal(victim); Figure 2: steal-initiation code The shouldbalance method determines the policy regarding when to initiate a steal attempt 7. Many policies are conceivable, a few of which are: Try to steal only when the local deque is empty: this is the scheme implemented by Arora et al. [2], and we call it: steal-on-empty. Try to steal probabilistically, with the probability decreasing as the number of items in the deque increases: this scheme was suggested by Rudolph et al. [12], and we call it probabilistie balancing s Try to steal whenever the number of items in the deque increases/decreases by a constant factor from the last time a steal-attempt was performed: this is the policy suggested in [3]. 6[a b]m denotes the closed array-range, from entry a up-to entry b (including both a and b), modulus m. 7This can be implemented as inlined code rather than as a method, if this code should be performed very often. SThe scheme, as described, is always probabilistic in the sense that the victim process is selected at random. The probabilistic balancing scheme adds yet another probabilistic factor. 282

4 Steal-initiation policies can vary significantly with respect to the extent they make the system balanced, but obviously none can guarantee that the system is balanced, if balancing is not attempted frequently enough. Consequently, we have to make some reasonable assumptions regarding the frequency at which the steal-initiation code is performed. Let Au be a time-period small enough, such that it is guaranteed that no process p changes its load by more than u items during that period, by generating/consuming items (thefts notwithstanding). In the analysis we prove, that if the probabilistie balancing policy is employed, and the stealinitiation code is performed once every Au period, then the expected number of deque-items of any process p at any time during the execution is no more than o~ times the total average, where a~ is a constant that does not depend on the number of processes or the dynamic pattern of item generation/consumption 9. Not every steal-initiation policy can guarantee this property, though. It can easily be shown, that under the stealon-empty policy, where only a single item is stolen at a time, a system can grow unbalanced beyond any bound, even if the steal-initiation code is performed every A1 time period! The balancing initiation code can be performed periodically by the system or by an application, or it can be implemented as a signal-handler pushbottom code A high-level pseudo-code of pushbottom is shown in Figure 3. RETIJRN_CODE pushbottom(item *e) { IF deq is full return DEQ_FULL deq[bot] = e increment hot in a cyclic manner IF (deq length now equals 2"i for some i) Try to CAS-update the stealrange to contain max(1,2"(i-1)) items ELSE IF (stealrange!= prevstealrange) Try to CAS-update the stealrange to contain max(1,2"i) items, where the current length of the deque is in the range [ 2~(i+I),2~(i+2) ) IF CAS was performed successfully prevstealrange = stealrange Figure 3: pushbottom pseudo code The pushbottom operation is performed by the local process p, whenever it needs to insert a new item into its local deque. If the deque is not full, the new item is placed in the entry pointed at by bot, and bot is incremented in a cyclic manner. However, if following the insertion of the new item the length of p's deque becomes 2 i for some i > 0, then p tries to CAS-update its steairange to contain the topmost 2 i-1 items 1. This is to make sure the length of the steal- Range is not much less than half the total number of items in the deque 11 Process p has to update the steairange even if the dequelength is not a power of 2, if another process has succeeded 9Obviously, A~ itself does depend on this pattern. 1 or 1, if i=0. 1aWe actually prove in the full paper that for any process p, at any time, the length of p's stealrange is at least ~'th of the length of p's deque. Before pushbottom0 bot de( e ste~ o After pushbottomo dequ(!i' 4, 5, 6' y o bot. ~ 1514 Figure 4: The extended deque before and after a pushbottom. Right after pushbottom inserts a new item into the deque, it checks whether the deque length reaches a powerof-2 boundary (in the above Figure, the 16'th item was added), in which case steairange is expanded to contain the first half of the deque-elements. in stealing items from p since the last time p updated its stealrange. Process p can identify this by comparing the current value of its steairange with the last value it wrote to it - which is stored at its local prevstealrange. If the values differ, p tries to set the length of its steairange (by a CAS) to be max(l, 2/), where 2 TM _< len(deq) < 2/+2. If p performed a successful CAS, prevstealrange is updated with the value it wrote. Figure 4 depicts the state of an extended deque before and after the 16'th item is pushed into it. Since subsequent to the push the deque contains a power-of-2 number of items, pushbottom tries (and in this case succeeds) to expand stealrange. It is easily seen (and is proven in our analysis) that immediately after every successful CAS operation performed by pushbottom, assuming that the pushed item is not the first item in the deque 12, the following holds: 2 len(stealrange) < len(deq) < 4 len(steairange) popbottom code A high-level pseudo-code of popbottom is presented in Figure 5. The popbottom operation is performed by the local process p whenever it needs to consume another deque-item. Section 1: In section 1 a process performs some prechecks to make sure the method may proceed. It first checks whether the deque is empty, in which case the method returns null; it next checks whether the length of the deque is 2 i for some i _> 0. If this is indeed the case, then the method tries to CAS-update its steal- Range to contain the topmost 2 i-2 items 13. This is done in order to maintain the invariant that the length of stealrange neve~ exceeds half the total number of 12If the pushed item is the first item of the deque, then the lengths of both the deque and the steairanye right after the CAS equal 1. lsor 1, if i <

5 Item *popbottomo { Section 1 IF deq is empty return null IF (deq length now equals 2"i for some i) Try to CAS-update the stealrange to contain max(l,2^(i-2)) items ELSE IF (stealrange!= prevstealrange) Try to CAS-update the stealrange to contain max(l,2*i) items, where the current length of the deque is in the range (2^(i+1)... 2~(i+2) ] Section 2 IF CAS was performed successfully prevstealrange = stealrange ELSE IF a CAS was attempted but failed return ABORT; Before popbottom0 d~ue steal-range 4 o 4 vf.* 2: 1 ~---=-t / * 3, [tools * 4, I--'-"t 5, Ilasfl~ * 6, t. J ~ * 7 1 bot * 8 * 9 * l~ * 11 * 12 * 13 * 14 * After popbottomo deque s t e ~ a J bot Decrement bot in a cyclic manner Item *e = deq[bot] oldstealrange = this->stealrange IF oldstealrange does not contain bot return e (no need to synchronize) ELSE IF oldstealrange is empty { bot=o (the last item - e - was already stolen by another process) return null } ELSE { Try to CAS-update the stealrange to be empty IF succeeded return e (e was not stolen so far) ELSE return null (the last item - e - was already stolen by another process) } } Figure 5: popbottom pseudo code items in the deque, unless there's a single item in the deque. As in pushbottom, p has to CAS-update the stealrange even if the deque-length is not a power of 2, if another process has succeeded in stealing items from p since the last time p updated its stealrange. Process p can identify this by comparing the current value of its stealrange with the last value it wrote to it - which is stored at prevsteairange. If the values differ, p tries to set the length of its stealrange to be max(1,2i), where T +1 < len(deq) _< T +2. If popbottom performed a successful CAS, it updates prevsteairange with the value it wrote. If however a CAS was attempted and failed, popbottom returns a special value ABORT, indicating this situation. Note, that this can only happen if a successful steal has occurred concurrently with the method's execution. 14. Section 2: Section 2 is very similar to its steal-one [2] counterpart: it pops the bottom-item e off the deque, and then reads steairange. If this read does not include e, then the method returns e and exits, as no 14An alternative implementation is to retry again and again in a loop, until the method succeeds in updating stealrange. Figure 6: The extended deque before and after a popbottom. Just before popbottom pops the bottom-most item from the deque, it checks whether the deque length is exactly at a power-of-2 boundary (in the above Figure, the 16'th item is about to be popped), in which case stealrange is being shrank to contain the first quarter of the deque-elements other process may have stolen e; otherwise, there are 2 possibilities: - stealrange is empty - e was stolen by another process during the method's execution. The method returns null. - e is the one-and-only item in the deque, in which case the method tries to CAS-update stealrange with an empty range value. If it succeeds - it returns e as its result, otherwise it returns null. Figure 6 depicts the state of an extended deque before and after the 16'th item is popped off it. Since prior to the pop the deque contains a power-of-2 number of items, popbottom tries to shrink stealrange (and in this case succeeds). It is easily seen (and is proven in our analysis) that immediately after every successful CAS operation performed by popbottom the following inequalities hold 15 2 * len(steairange) _< len(deq) _< 4 * len(steairange) trytosteal code The try ToSteal method is the method that actually attempts to perform a steal. It is called by a local process p, after the steal-initiation policy has determined that a steal-attempt should be made and a victim process has been selected. The pseudo-code of trytosteal is shown in Figure 7. The method receives 2 parameters: d - a pointer to the victim process' extended-deque, and plen - the number of items in p's deque, and returns the number of items actually stolen. It first reads d.stealrange and computes its length, 15The below inequalities hold, unless the number of items in the deque after the pop is 0 or 1. If the pop empties the deque, then the lengths of both the deque and the steal_range after the operation are 0; if a single item is left in the deque after the pop, the lengths are both

6 based on which the method can determine whether it is certain that the victim has more items than p has; if this is not the case - the method returns 0 without stealing any item. Otherwise, about half of the guaranteed difference between the sizes of the 2 deques is copied to p's deque, and then p tries to CAS-update the victim's stealrange. The length of the new steairange is set to be max(l, 2i-2), where the length of the victim's deque before the theft was in the range: [2 i... 2i+1). As we prove in our analysis, this guarantees that the new steairange has length that is at most half, and at least one eighth the remaining number of items. If the CAS fails, the steal-attempt has failed also, and the method returns 0; otherwise - the steal-attempt has succeeded, and the method proceeds to update p's bot and steal- Range to reflect the new number of items in the deque. Note the following: If another process q succeeds in stealing items from p's deque concurrently to p's successful steal-attempt, then p might fail in updating its own steairange, but it would still manage to update its bot and complete the steal successfully. Although trytosteal can be performed within a signalhandler, it is required that no local operations (namely pushbottom or pop Bottom ) are performed concurrently with trytosteal's execution. unsigned int *trytosteal(exdeque d, int plen){ ) rangelen = length of d.oldstealrange oldlen = length of d.deque IF plen> 2*rangeLen -2 return 0 (no need to steal) numtosteal = rangelen - plen/2 Copy first numtosteal items to the bottom of local deq newsrlen = max(l, 2"(i-2)) [where 2"i <= oldlen < 2"(i+i)] CAS-update d.stealrange to contain newsrlen items IF CAS was performed successfully { update local bot to insert stolen items to the deque newdeqlen = new number of items at the local deque newsrlen=max(l,2"i) [where 2"(i+2)> newdeqlen >=2"(i+i)] CAS-update local stealrange to contain newsrlen items IF CAS was performed successfully prevstealrange = stealrange return numtosteal; ) ELSE return 0; 3 Analysis Figure 7: trytosteal method pseudo-code In our analysis, we investigate the properties of the StealHalf algorithm under the probabilistic balancing policy, aiming to show that under reasonable assumptions, it keeps the system balanced. The probabilistic balancing policy we employ is very similar to the one described in [12], with the following main differences: 1. In [12], a process initiating load-balancing can either steal items from or insert items to a randomly selected process, whereas in our scheme the initiating process can only steal items. We therefore call the former scheme symmetric probabilistic balancing and the latter asymmetric probabilistic balancing. 2. The model presented in [12] is synchronous in the sense that it assumes that computation proceeds in timesteps and that all processes attempt load-balancing in the beginning of every time-step, whereas in the asynchronous model we investigate this cannot be assumed. [12] supplies a proof that symmetric probabilistic balancing keeps the system well balanced. It turns out, however, that this proof is incomplete. In the analysis presented in this section, we therefore supply an alternative proof for the symmetric case and then extend it also for asymmetric probabilistic balancing and specifically for the SteaiHalf algorithm. For lack of space, some of the proofs are omitted. 3.1 Notation Wherever possible, we follow the notation of [12]. To simplify the analysis, we assume that an execution is composed of a series of time-steps. In the beginning of each time-step, all processes flip biased coins to determine whether or not they should attempt balancing (in other words, they perform the balancing initiation code), and according to the result attempt or do not attempt balancing. Let Lp,t denote the number of items in p's work-pile in the beginning of step t; also, let At denote the average system workload in the beginning of step t, namely: At - ~pep Lp,~ [Pl Process p decides to perform a balancing attempt at time t with probability: _e_ Lp,t ' for some constant 1 > # > O. When Lp,t equals O, we define --- Lp,t to be 1. Ifp does decide to balance at time t, then it randomly selects another process q to balance with and tries to communicate with q to that effect. We denote this event by select(p, q, t) and therefore we have: 1 p(s~t(p,x,t))= ~, - -. Lp,t n If at time t any one of the processes p or q selects the other in a balancing attempt, we denote this event by: select(p, q, t), namely: select(p, q, t) de] ~ = select(p, q, t) V selec~(q,p, t). If at time t, there are a few processes that initiate a balancing attempt with process p, then only one of them gets selected randomly. If process q is the one, we say that q approaches p at time t and we denote this event by approaches(q,p,t). If process q is the only process initiating a balancing attempt with p at time t, then approaches(q,p,t) also holds. If at time t, approaches(q,p,t) and p does not initiate a balancing attempt at time t, then p and q balance at time t. We denote a balancing event between p and q at time t by: balance(p, q, t). If at time t, approaches(q,p,t) and approaches(p,q,t), then p and q balance at time t. If at time t, approaches(p,q,t) and approaches(q,r,t), where p r, then q decides between p and r with equal probability. We denote this event by decide(q,x,t). Finally, if at time t both decide(p,q,t) and decide(q,p,t) hold, then balance(p, q, t) holds. The changes in the work-pile size of any process p at time step t come from two sources: items which are added 285

7 - - Lp,---~ or consumed due to the code executed by p at that step, and items added as a result of balancing operations. Let contrib[p, t] denote the change in the size of p's work-pile at time-step t which is a result of a balancing operation. In Section 3.2 we analyze symmetric probabilistic balancing. In Section 3.3 we analyze asymmetric probabilistic balancing in general, and the StealHalf algorithm in particular. 3.2 Symmetric probabilistic balancing analysis In the following we investigate the relationship between p's work-pile size in the beginning of time-step t, and the expectance of contrib[p, t]. Lemma 3.1 For all p and t we have: P(select(p,x,t)) = 1 -- (1 # 1 1)(1 n - # 1 Lx,t n - 1 )" Proof select(p, q, t) gel = select(p, ~ q, t) V select(q, p, t). Consequently we have: P(select(p, q, t)) = 1 - P(-~select(p, q, t)) = (1 - P(-~se~ect(p, q, t) )P(-~se~ect(q,p, t) ) ). Finally note, that for all different pairs of processes x, y and for all t: tt 1 P(-~select(x, y, t)) = 1 -- ~ * n -- 1" Q.E.D. Lemma 3.2 Let hi, 1 < i < n be n positive real-numbers, then the following holds: Proof ~ n -~ > n-- " i=1 ai -- (2i=lai)/n We actually have to prove that16: ai i=1 j=l Note that for every two positive real-numbers a, b we have: By using inequality 1 we get: a aj b + - > 2. (1) a (~ hi) ~ =n+- aj 1 E (aj. -q- ~) > n+2* (~) = n 2. i=1 j=l l<i~j_<n Q.E.D. The following lemma states that P(select(p,q,t)) and P(balanee(p, q, t) differ only by a constant factor. 16This lemma is actually a rephrasing of the arithmetic/harmonic mean inequality. Since the proof is very short, we provide it for the sake of presentation-completeness. Lemma 3.3 p and q, For every step t, and for every two processes 2~P(select(p, q, t)) < P(balance(p, q, t)) < P(select(p, q, t)). Proof outline: p and q can balance at step t only if at least one of them selects the other at that step. Consequently P(balance(p, q, t)) < P(select(p, q, t)). As for the other direction, note that if all the following conditions hold, it is guaranteed that balance(p, q, t) holds: C1: select(p,q, t) holds; C2: No other process r selects p or q at time t, namely: Vr ~ p, q : -~select(r, p, t) A -~select(r, q, t). C3: Neither of p, q select another process r at time t, or the following holds: p[q] selects q[p], q[p] selects a different process r, but q[p] decides to balance with p[q] rather than with r. In other words: (Vr # p, q : (-~select(p, r, t) A select(q, r, t)) OR 3r : [(select(p, q, t) A select(q, r, t) A decide(q, p, t))] OR 3r : [select(q,p, t) A select(p, r, t) A decide(p, q, t)] Consequently we have: P(balance(p, q, t)) > P(select(p, q, t)) * P(C2) * P(C3). In the full proof we show that P(C2) ~ e~ and P(C3) > 1 and thus obtain the result. Theorem 3.4 Proof Let Lp,t = hat, a > 1, then: E[contrib(p, t)] = -fl(a). Clearly -- Lp,~ E[contrib(p,t)] = E P[balance(p,x,t)]L~'t 2 x6p,x~p (2) Let P~- denote the set of processes whose work-pile size is larger than hat at the beginning of time-step t, and let P~ = P - P~+. We get: E[eontrib(p,t)] = ~cp+ P[balance(p,x,t)]L~'~Le,' + ~zepz P[balance(p, x, t)] n*'*~ Le't (3) Note, that the summation over P~+ contains only positive summands, whereas the summation over P~- contains only non-positive summands. Consequently, we can use Lemma 3.3 (for each summation separately) to get: Lz,t--Lp,t E[contrib(p,t)] < ~&+P[seleet(p,x,t)] 2 + 2~ E~pj P[select(p, x, t)] L~,,~Lp,, (4) We now bound each sum separately from above. We start with the positive summands. By using Lemma 3.1 we get: ExeP~- P[seleet(p, x, t)] Lx"~ Lr't = 286

8 E~ep2.(l_(l_ t~ ~-~-y)(1- t~ ~_~_y))~_~_~_~2i = Lp,t L~,t By multiplying and re-arranging we get: L~,t = E 2(ntt--1) Lp,t x e P~ r xepa + p Lp,t 2(n- 1) L~,t # E 2(n:~)2(~.t Lp,t )" We now bound A, -B and C: A = ~ 2(~-1)L~,~ ~cp2-l='t < -- 2(n--1)Lp,t ExeP L~,t = t~ nat ---- ~n 2(n--1)aAt 2a(n-- 1) < 1. B is positive and so -B is bound by 0 from above. As for C: #2 #2 C< 2(n_1)2( ~,<_/ < n " IP2-1 Combining the upper bounds for A, -B and C we get: - Lp,t < 2. (5) E P[balance(p, x, t)] L~,t 2 - xe Pa + Now, we bound the negative summands. First, note that IPa+l _< n- a and so IP~I _> n(l - 2)" Again, by multiplying and re-arranging summands we get the same A, -B and C components, and we bound them from above. A and C can be bounded by 1 and ~- respeen tively, in exactly the same way it was done for the positive summands; as for B: Lp,t _ #sat 1 B-- E 2(n p--1) L=,t 2~---i) E L~,t (6) zep~- zep~- I Noting that the average work-pile length for processes in Pj is not more than At~ and using Lemma 3.2 we get: S _> n(1 - ~-) At By substituting this upper bound for S in Equation 6 we get: B_> (2(---(-(-(-(-(-(-(-(~_ "'#sat 1) ~(n(1 ~))_>_ 1 #(1~- ~)s = e(s). Combining the upper bounds on A, B and C, we get: Lx,t E P[balance(p,x, t)] 2 xepz Lp,t - a(s). (7) Finally, substituting the upper bounds of Equations 5 and 7 in Equation 4 concludes the proof of the theorem. Q.E.D. We now consider the effect of symmetric probabilistic balancing, when balancing attempts are performed at a certain frequency. We define the balancing quantum as the time duration of each time-step. If the balancing quantum is A, we say that the balancing frequency is ~. 1 We say that a load-balancing algorithm with balancing quantum A is locally-bounding, if there is a constant s, independent of the number of processes, such that for any execution the following is guaranteed: Vp, t : E[Lp,t] < sat. We also say that the work-queues system is s-locally bounded at time t, if the following inequalities hold during at time t: Vp : Lp,t < sat. As noted earlier, the difference between Lp,t and Lp,t+A comes from two sources: from the effect of a balancing operation that may or may not take place at that time-step (and its contribution is denoted by contrib[p, t]), and from work-items that are generated and/or consumed by p in its application execution during that time-step. We denote by A~ a time-quantum small enough, such that the application execution (balancing operations notwithstanding) does not change the length of any work-pile by more than u items. The following theorem proves that symmetric probabilistic balancing with frequency 1 for any integer u, is a locally bounding scheme. Theorem 3.5 Assume symmetric probabilistic balancing is employed with balancing-frequency ~, 1 then there is a constant su, not depending on the number of processes or the application, such that if the system starts s~-locally bounded - then the following inequalities hold: Vp, t : E[Lp,t] < swat. Proof According to Theorem 3.4, there is a constant c, not depending on the number of processes, such that: (/3 > 1) A (Lp, t >/3. At) ~ E[contrib(p, t)] < -c/3. (8) We show that s~ = -U 2.~ is the constant we are seeking. Note, that if L(p, t) > omat at the beginning of an execution quantum, then E[contrib(p, t)] < -2u. During a A~ time-quantum of execution, Lp,t can grow by at most u items. During that period, the system average can decrease by at most u items, so during this period (Lp,t- At) can grow by at most 2u items. Q.E.D. Corollary 3.6 Assume symmetric probabilistic balancing is employed with balancing-frequency ~,1 then there is a constant su, not depending on the number of processes or the application, such that if the system starts imbalanced and runs long enough - it eventually becomes s~-locally bounded 3.3 StealHalf probabilistic balancing analysis The scheme described in [12] is symmetric in the sense that a balancing operation between two processes p, q can take place if either one of them initiates it. This is not the case 287

9 for the StealHalf algorithm since it only allows stealing items and does NOT support insertion of items In other words, if at time t, Lp,t < Lq,t, then only p can initiate a balancing operation at that time with q, and not vice-versa. In the following we prove that asymmetric probabilistic balancing also possesses the nice property of being locally bounding. The following lemma states that P(select(p,q,t)) and P(balance(p,q, t) differ only by a constant factor also for asymmetric probabilistic balancing Lemma 3.7 For every time step t, and for every two processes p and q such that Lp,t <: Lq, t it holds that: 1 P(se~ect(p,q,t)) < P(balance(p,q,t)) < P(selee~(p,q,t)). 2e 2 The proof is almost identical to the proof of Lemma 3.3. Next, Theorem 3.8, corresponding to Theorem 3.4, is derived for the asymmetric case. Theorem 3.8 Let Lp4 = olaf, v~ > 1, then it holds for asymmetric probabilistic balancing that: E[contrib(p, t)] = -~(o~). Proof outline: Note, that Equation 4 holds also for asymmetric balancing; however, unlike the symmetric case, where we have to consider balancing initiation by all possible processpairs which include p - for asymmetric balancing, when we consider balancing operations at time t that affect p, we only have to consider the following events: 1. se~ect(p,x,t), for x E P2 2. seleel(x,p,t), for x E P~- The proof's structure is similar to the proof of Theorem 3.5 (the corresponding theorem for the symmetric case). Based on Theorem 3.8, the following theorem, corresponding to Theorem 3.5, is proven for the asymmetric case. The proof is almost identical. Theorem 3.9 Assume asymmetric probabilistic balancing is employed with balancing-frequency ~, 1 then there is a constant a,, not depending on the number of processes or the application, such that if the system starts at-locally bounded - then the following inequalities hold: Vp, t : E[Lp,t] < c~uat. The following theorem states that the StealHalf algorithm maintains the following invariant for all processes p at all times: at least one eighth the number of items in p's deque can be stolen atomically. Theorem 3.10 Under StealHalf load balancing (for all policies) the following holds: Vp, t : Len(steaIRange(p,t) ) > Len(deq(p, t) ) 8 The proof proceeds by enumerating the statements which potentially modify either Len(deq)) or Len(stealRange)) and showing that the invariant is maintained after each one of them is executed. It is rather technical and for lack of space is not provided here; still, let us explain an interesting scenario that is encountered in the course of the proof, that of consecutive successful steals from the same process. Before first steal d~ stea~ U --~." ~o, z>--i im ~ 31 A,T i 63 pre~alrange ~ After first steal t IRag d kq~ eo s~ ne first : 32 steal : 41 bot prevstealrange : ~ second After second steal d ~..~ e stealrange : 0 hot : prevst~lrarlge : ~ 30 ~ 45 Figure 8: A scenario of 2 consecutive steals. In the first steal-operation, 14 items are stolen (out of the 32 items which can be stolen atomically). In the second steal, all of the 16 items in the stealrange are stolen. A x/sign in the prevstealrange box indicates that it's equal to the steal- Range; an X sign indicates it is different from the stealrange Figure 3.3 shows a scenario where two consecutive stealoperations are performed on process p's deque, while in the meantime p is not performing any operation on its deque. Initially, p's stealrange contains 32 items, but the first thief only steals 14 of them (which is what it needs to balance with p). When the thief looks at bot, it equals 95, but before the stem's CAS operation completes, it may change within the range [ ]. To make sure that after the steal stealrange would not contain more than half the remaining items, the thief sets the new length of stealrange to be one eighth of the maximal possible value of bot plus 1, namely -g- 128 ~_ 16. In the second steal shown at Figure Figure 3.3, the thief steals all of the 16 items in the stealrange, and again sets the new length of stealrange to contain 2-s ~s = 16 items. This is correct, since this time prevstealrange differs from stealrange, and so the local process p cannot change bot without performing a CAS. Remember that Theorem 3.9 above assumes synchrony, in the sense that processing proceeds in time-steps, and all processes perform balancing-initiation in the beginning of every time-step. Additionally, it assumes that up to half the items of any process can be stolen atomically. Contrary to this, the StealHalf algorithm's setting is entirely asynchronous, and though the balancing frequency of every process is guaranteed, processes perform their balancinginitiation operations in separate times; additionally, the Steal- Half algorithm can only guarantee that at least one eighth of a process' items can be stolen atomically. The modifications required in the above proof to prove that StealHalf is a locally bounding scheme are straightforward. They are not brought here for lack of space, but are to appear in the full paper. 4 Correctness In the full paper we prove the following theorem which shows that our algorithm has the same non-blocking property as that of Arora et al: the collective progress of processes in accessing the extended-deque structures is guaranteed. Theorem 4.1 The StealHalf algorithm on a collection of extended-deques, with operations pushbottom, popbottom, and trytosteal, is non-blocking. 288

10 We note that both algorithms are not fault tolerant in the sense that process failures, though non-blocking, can cause the loss of items. In [13], Shavit and Touitou formally define the semantics of a pool data structure. A pool is an unordered queue, a concurrent data structure that allows each processor to perform sequences of push a pop operations with the usual semantics. In the full paper we provide the proofs of the key lemmata necessary to prove the following theorem: Theorem 4.2 The SteaIHalf algorithm on a collection of extended-deques, with operations pushbottom, popbottom, and try ToSteal, is a correct implementation of a pool data structure. We define the complexity of our algorithm in terms of the total number of synchronization operations necessary by analyzing it for monotonic sequences of pushbottom and pop- Bottom operations. We do so since one cannot make claims in situations where the execution pattern of the underlying application causes thrashing back and forth on a single deque entry. For a given deque, a monotonic sequence is one in which all operations are either pushbottom or popbottom but not both, with a possible interleaving of trytosteal operations. We prove the following: Theorem 4.3 For any monotonic sequence of length k by process p during which m successful steal attempts are performed on p's extended deque, p performs at most O(log(k)+ m) CAS operations on the extended deque. 5 Acknowledgments We would like to thank Dave Detlefs, Maurice Herlihy, Victor Luchangco, and Mark Moir for their helpful comments on earlier drafts of our algorithm. Though he denies it and has managed to conveniently lose all evidence, we believe Dave Detlefs came up with a steal-half algorithm very similar to ours in the context of his work on Sun's Parallel Garbage Collection [5]. [5] FLOOD, C., DETLEFS, D., SHAVIT, N., AND ZHANG, C. Parallel garbage collection for shared memory multiprocessors. In Usenix Java Virtual Machine Research and Technology Symposium (JVM '01) (Monterey, CA, Apr. 2001). [6] HERLIHY, M. Wait-free synchronization. ACM Transactions On Programming Languages and Systems 13, 1 (Jan. 1991), [7] LEISERSON, AND PLAAT. Programming parallel applications in cilk. SINEWS: SIAM News 31 (1998). [8] LULING, R., AND MONIEN, B. A dynamic distributed load balancing algorithm with provable good performance. In ACM Symposium on Parallel Algorithms and Architectures (1993), pp [9] MITZENMACHER, M. Analysis of load stealing models based on differential equations. In A CM Symposium on Parallel Algorithms and Architectures (1998), pp [10] MITZENMACHER, M. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems 12, 10 (2001), [11] MoIa, M. Practical implementations of non-blocking synchronization primitives. In Proceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing (August 1997), pp [12] RUDOLPH, L., SLIVKIN-ALLALOUF, M., AND UPFAL, E. A simple load balancing scheme for task allocation in parallel machines. In A CM Symposium on Parallel Algorithms and Architectures (1991), pp [13] SHAVIT, N., AND TOUITOU, D. Elimination trees and the construction of pools and stacks. Theory of Computing Systems, 30 (1997), References [1] ACAR, U. A., BLELLOCH, G. E., AND BLUMOFE, R. D. The data locality of work stealing. In ACM Symposium on Parallel Algorithms and Architectures (2000), pp [2] ARORA, N. S., BLUMOFE, R. D., AND PLAXTON, C. G. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems 34, 2 (2001), [3] BERENBRINK, P., FRIEDETZKY, T., AND GOLDBERG, L. A. The natural work-stealing algorithm is stable. In Proceedings of the 42th IEEE Symposium on Foundations of Computer Science (FOCS) (2001), pp [4] BLUMOFE, R., AND LEISERSON, C. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico. (November 1994), pp

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing

{emery,browne}@cs.utexas.edu ABSTRACT. Keywords scalable, load distribution, load balancing, work stealing Scalable Load Distribution and Load Balancing for Dynamic Parallel Programs E. Berger and J. C. Browne Department of Computer Science University of Texas at Austin Austin, Texas 78701 USA 01-512-471-{9734,9579}

More information

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

arxiv:1112.0829v1 [math.pr] 5 Dec 2011 How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

More information

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com Abstract

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

On Dynamic Load Balancing on Graphics Processors

On Dynamic Load Balancing on Graphics Processors Graphics Hardware (28) David Luebke and John D. Owens (Editors) On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Department of Computer Science and Engineering Chalmers

More information

Load Balancing with Migration Penalties

Load Balancing with Migration Penalties Load Balancing with Migration Penalties Vivek F Farias, Ciamac C Moallemi, and Balaji Prabhakar Electrical Engineering, Stanford University, Stanford, CA 9435, USA Emails: {vivekf,ciamac,balaji}@stanfordedu

More information

Dynamic Load Balancing using Graphics Processors

Dynamic Load Balancing using Graphics Processors I.J. Intelligent Systems and Applications, 2014, 05, 70-75 Published Online April 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2014.05.07 Dynamic Load Balancing using Graphics Processors

More information

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

More information

Obstruction-Free Synchronization: Double-Ended Queues as an Example

Obstruction-Free Synchronization: Double-Ended Queues as an Example Obstruction-Free Synchronization: Double-Ended Queues as an Example Maurice Herlihy Computer Science Department Brown University, Box 1910 Providence, RI 02912 Victor Luchangco Mark Moir Sun Microsystems

More information

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by

More information

Towards Relaxing STM. Christoph M. Kirsch, Michael Lippautz! University of Salzburg. Euro-TM Workshop, April 2014

Towards Relaxing STM. Christoph M. Kirsch, Michael Lippautz! University of Salzburg. Euro-TM Workshop, April 2014 Towards Relaxing STM Christoph M. Kirsch, Michael Lippautz! University of Salzburg! Euro-TM Workshop, April 2014 Problem linear scalability positive scalability good performance throughput (#transactions/time)

More information

Contributions to Gang Scheduling

Contributions to Gang Scheduling CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

More information

The Trip Scheduling Problem

The Trip Scheduling Problem The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems

More information

Transactional Support for SDN Control Planes "

Transactional Support for SDN Control Planes Transactional Support for SDN Control Planes Petr Kuznetsov Telecom ParisTech WTTM, 2015 Software Defined Networking An emerging paradigm in computer network management Separate forwarding hardware (data

More information

Stability and Efficiency of a Random Local Load Balancing Protocol

Stability and Efficiency of a Random Local Load Balancing Protocol Stability and Efficiency of a Random Local Load Balancing Protocol Aris Anagnostopoulos Dept. of Computer Science Brown University Providence, RI 0292 aris@cs.brown.edu Adam Kirsch Dept. of Computer Science

More information

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva robert.geva@intel.com Introduction Intel

More information

Comparison on Different Load Balancing Algorithms of Peer to Peer Networks

Comparison on Different Load Balancing Algorithms of Peer to Peer Networks Comparison on Different Load Balancing Algorithms of Peer to Peer Networks K.N.Sirisha *, S.Bhagya Rekha M.Tech,Software Engineering Noble college of Engineering & Technology for Women Web Technologies

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

Low upper bound of ideals, coding into rich Π 0 1 classes

Low upper bound of ideals, coding into rich Π 0 1 classes Low upper bound of ideals, coding into rich Π 0 1 classes Antonín Kučera the main part is a joint project with T. Slaman Charles University, Prague September 2007, Chicago The main result There is a low

More information

Conditions for Strong Synchronization In Concurrent Data Types

Conditions for Strong Synchronization In Concurrent Data Types Conditions for Strong Synchronization In Concurrent Data Types Maged Michael IBM TJ Watson Research Center Joint work with Martin Vechev and Vijay Sarsawat Joint work with Hagit Attiya, Rachid Guerraoui,

More information

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Brewer s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Seth Gilbert Nancy Lynch Abstract When designing distributed web services, there are three properties that

More information

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

More information

The Student-Project Allocation Problem

The Student-Project Allocation Problem The Student-Project Allocation Problem David J. Abraham, Robert W. Irving, and David F. Manlove Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK Email: {dabraham,rwi,davidm}@dcs.gla.ac.uk.

More information

Big Data & Scripting storage networks and distributed file systems

Big Data & Scripting storage networks and distributed file systems Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node

More information

Lecture 4 Online and streaming algorithms for clustering

Lecture 4 Online and streaming algorithms for clustering CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

How Asymmetry Helps Load Balancing

How Asymmetry Helps Load Balancing How Asymmetry Helps oad Balancing Berthold Vöcking nternational Computer Science nstitute Berkeley, CA 947041198 voecking@icsiberkeleyedu Abstract This paper deals with balls and bins processes related

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,

More information

Load Balancing Between Heterogenous Computing Clusters

Load Balancing Between Heterogenous Computing Clusters Load Balancing Between Heterogenous Computing Clusters Siu-Cheung Chau Dept. of Physics and Computing, Wilfrid Laurier University, Waterloo, Ontario, Canada, N2L 3C5 e-mail: schau@wlu.ca Ada Wai-Chee Fu

More information

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid

More information

Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

More information

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms 27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Stoyan Garbatov Seer: Scheduling for Commodity

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

Chapter 7D The Java Virtual Machine

Chapter 7D The Java Virtual Machine This sub chapter discusses another architecture, that of the JVM (Java Virtual Machine). In general, a VM (Virtual Machine) is a hypothetical machine (implemented in either hardware or software) that directly

More information

Hardware Assisted Virtualization

Hardware Assisted Virtualization Hardware Assisted Virtualization G. Lettieri 21 Oct. 2015 1 Introduction In the hardware-assisted virtualization technique we try to execute the instructions of the target machine directly on the host

More information

Operating Systems. Virtual Memory

Operating Systems. Virtual Memory Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page

More information

How Useful Is Old Information?

How Useful Is Old Information? 6 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 11, NO. 1, JANUARY 2000 How Useful Is Old Information? Michael Mitzenmacher AbstractÐWe consider the problem of load balancing in dynamic distributed

More information

Versioned Transactional Shared Memory for the

Versioned Transactional Shared Memory for the Versioned Transactional Shared Memory for the FénixEDU Web Application Nuno Carvalho INESC-ID/IST nonius@gsd.inesc-id.pt João Cachopo INESC-ID/IST joao.cachopo@inesc-id.pt António Rito Silva INESC-ID/IST

More information

Firewall Verification and Redundancy Checking are Equivalent

Firewall Verification and Redundancy Checking are Equivalent Firewall Verification and Redundancy Checking are Equivalent H. B. Acharya University of Texas at Austin acharya@cs.utexas.edu M. G. Gouda National Science Foundation University of Texas at Austin mgouda@nsf.gov

More information

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu.

Procedia Computer Science 00 (2012) 1 21. Trieu Minh Nhut Le, Jinli Cao, and Zhen He. trieule@sgu.edu.vn, j.cao@latrobe.edu.au, z.he@latrobe.edu. Procedia Computer Science 00 (2012) 1 21 Procedia Computer Science Top-k best probability queries and semantics ranking properties on probabilistic databases Trieu Minh Nhut Le, Jinli Cao, and Zhen He

More information

The Relation between Two Present Value Formulae

The Relation between Two Present Value Formulae James Ciecka, Gary Skoog, and Gerald Martin. 009. The Relation between Two Present Value Formulae. Journal of Legal Economics 15(): pp. 61-74. The Relation between Two Present Value Formulae James E. Ciecka,

More information

Facing the Challenges for Real-Time Software Development on Multi-Cores

Facing the Challenges for Real-Time Software Development on Multi-Cores Facing the Challenges for Real-Time Software Development on Multi-Cores Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany siebert@aicas.com Abstract Multicore systems introduce

More information

CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University

CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University CPU Scheduling Yi Shi Fall 2015 Xi an Jiaotong University Goals for Today CPU Schedulers Scheduling Algorithms Algorithm Evaluation Metrics Algorithm details Thread Scheduling Multiple-Processor Scheduling

More information

Concepts of Concurrent Computation

Concepts of Concurrent Computation Chair of Software Engineering Concepts of Concurrent Computation Bertrand Meyer Sebastian Nanz Lecture 3: Synchronization Algorithms Today's lecture In this lecture you will learn about: the mutual exclusion

More information

Dr Markus Hagenbuchner markus@uow.edu.au CSCI319. Distributed Systems

Dr Markus Hagenbuchner markus@uow.edu.au CSCI319. Distributed Systems Dr Markus Hagenbuchner markus@uow.edu.au CSCI319 Distributed Systems CSCI319 Chapter 8 Page: 1 of 61 Fault Tolerance Study objectives: Understand the role of fault tolerance in Distributed Systems. Know

More information

Signature-Free Asynchronous Binary Byzantine Consensus with t < n/3, O(n 2 ) Messages, and O(1) Expected Time

Signature-Free Asynchronous Binary Byzantine Consensus with t < n/3, O(n 2 ) Messages, and O(1) Expected Time Signature-Free Asynchronous Binary Byzantine Consensus with t < n/3, O(n 2 ) Messages, and O(1) Expected Time Achour Mostéfaoui Hamouma Moumen Michel Raynal, LINA, Université de Nantes, 44322 Nantes Cedex,

More information

On Sorting and Load Balancing on GPUs

On Sorting and Load Balancing on GPUs On Sorting and Load Balancing on GPUs Daniel Cederman and Philippas Tsigas Distributed Computing and Systems Chalmers University of Technology SE-42 Göteborg, Sweden {cederman,tsigas}@chalmers.se Abstract

More information

Chapter 6 Concurrent Programming

Chapter 6 Concurrent Programming Chapter 6 Concurrent Programming Outline 6.1 Introduction 6.2 Monitors 6.2.1 Condition Variables 6.2.2 Simple Resource Allocation with Monitors 6.2.3 Monitor Example: Circular Buffer 6.2.4 Monitor Example:

More information

Lecture 6 Online and streaming algorithms for clustering

Lecture 6 Online and streaming algorithms for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Multicore Systems Challenges for the Real-Time Software Developer

Multicore Systems Challenges for the Real-Time Software Developer Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany siebert@aicas.com Abstract Multicore systems have become

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Distributed Systems: Concepts and Design

Distributed Systems: Concepts and Design Distributed Systems: Concepts and Design Edition 3 By George Coulouris, Jean Dollimore and Tim Kindberg Addison-Wesley, Pearson Education 2001. Chapter 2 Exercise Solutions 2.1 Describe and illustrate

More information

Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing

Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing LIFL Report # 2004-06 Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing Éric PIEL Eric.Piel@lifl.fr Philippe MARQUET Philippe.Marquet@lifl.fr Julien SOULA Julien.Soula@lifl.fr

More information

Real Time Programming: Concepts

Real Time Programming: Concepts Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

A New Nature-inspired Algorithm for Load Balancing

A New Nature-inspired Algorithm for Load Balancing A New Nature-inspired Algorithm for Load Balancing Xiang Feng East China University of Science and Technology Shanghai, China 200237 Email: xfeng{@ecusteducn, @cshkuhk} Francis CM Lau The University of

More information

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999 The ConTract Model Helmut Wächter, Andreas Reuter November 9, 1999 Overview In Ahmed K. Elmagarmid: Database Transaction Models for Advanced Applications First in Andreas Reuter: ConTracts: A Means for

More information

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I) SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics

More information

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge, cheapest first, we had to determine whether its two endpoints

More information

From Spontaneous Total Order to Uniform Total Order: different degrees of optimistic delivery

From Spontaneous Total Order to Uniform Total Order: different degrees of optimistic delivery From Spontaneous Total Order to Uniform Total Order: different degrees of optimistic delivery Luís Rodrigues Universidade de Lisboa ler@di.fc.ul.pt José Mocito Universidade de Lisboa jmocito@lasige.di.fc.ul.pt

More information

An example of a computable

An example of a computable An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept

More information

Concurrent Data Structures

Concurrent Data Structures 1 Concurrent Data Structures Mark Moir and Nir Shavit Sun Microsystems Laboratories 1.1 Designing Concurrent Data Structures............. 1-1 Performance Blocking Techniques Nonblocking Techniques Complexity

More information

Chapter 7 Application Protocol Reference Architecture

Chapter 7 Application Protocol Reference Architecture Application Protocol Reference Architecture Chapter 7 Application Protocol Reference Architecture This chapter proposes an alternative reference architecture for application protocols. The proposed reference

More information

Scheduling Sporadic Tasks with Shared Resources in Hard-Real-Time Systems

Scheduling Sporadic Tasks with Shared Resources in Hard-Real-Time Systems Scheduling Sporadic Tasks with Shared Resources in Hard-Real- Systems Kevin Jeffay * University of North Carolina at Chapel Hill Department of Computer Science Chapel Hill, NC 27599-3175 jeffay@cs.unc.edu

More information

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart

More information

Algorithms and Methods for Distributed Storage Networks 9 Analysis of DHT Christian Schindelhauer

Algorithms and Methods for Distributed Storage Networks 9 Analysis of DHT Christian Schindelhauer Algorithms and Methods for 9 Analysis of DHT Institut für Informatik Wintersemester 2007/08 Distributed Hash-Table (DHT) Hash table does not work efficiently for inserting and deleting Distributed Hash-Table

More information

SCALABILITY AND AVAILABILITY

SCALABILITY AND AVAILABILITY SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

CYCLIC: A LOCALITY-PRESERVING LOAD-BALANCING ALGORITHM FOR PDES ON SHARED MEMORY MULTIPROCESSORS

CYCLIC: A LOCALITY-PRESERVING LOAD-BALANCING ALGORITHM FOR PDES ON SHARED MEMORY MULTIPROCESSORS Computing and Informatics, Vol. 31, 2012, 1255 1278 CYCLIC: A LOCALITY-PRESERVING LOAD-BALANCING ALGORITHM FOR PDES ON SHARED MEMORY MULTIPROCESSORS Antonio García-Dopico, Antonio Pérez Santiago Rodríguez,

More information

Distributed Load Balancing for Machines Fully Heterogeneous

Distributed Load Balancing for Machines Fully Heterogeneous Internship Report 2 nd of June - 22 th of August 2014 Distributed Load Balancing for Machines Fully Heterogeneous Nathanaël Cheriere nathanael.cheriere@ens-rennes.fr ENS Rennes Academic Year 2013-2014

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Alok Gupta. Dmitry Zhdanov

Alok Gupta. Dmitry Zhdanov RESEARCH ARTICLE GROWTH AND SUSTAINABILITY OF MANAGED SECURITY SERVICES NETWORKS: AN ECONOMIC PERSPECTIVE Alok Gupta Department of Information and Decision Sciences, Carlson School of Management, University

More information

CPC/CPA Hybrid Bidding in a Second Price Auction

CPC/CPA Hybrid Bidding in a Second Price Auction CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09-074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper

More information

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG

More information

(Refer Slide Time: 01:11-01:27)

(Refer Slide Time: 01:11-01:27) Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 6 Digital systems (contd.); inverse systems, stability, FIR and IIR,

More information

Towards a compliance audit of SLAs for data replication in Cloud storage

Towards a compliance audit of SLAs for data replication in Cloud storage Towards a compliance audit of SLAs for data replication in Cloud storage J. Leneutre B. Djebaili, C. Kiennert, J. Leneutre, L. Chen, Data Integrity and Availability Verification Game in Untrusted Cloud

More information

Distribution One Server Requirements

Distribution One Server Requirements Distribution One Server Requirements Introduction Welcome to the Hardware Configuration Guide. The goal of this guide is to provide a practical approach to sizing your Distribution One application and

More information

Intelligent Log Analyzer. André Restivo

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt> Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

PPD: Scheduling and Load Balancing 2

PPD: Scheduling and Load Balancing 2 PPD: Scheduling and Load Balancing 2 Fernando Silva Computer Science Department Center for Research in Advanced Computing Systems (CRACS) University of Porto, FCUP http://www.dcc.fc.up.pt/~fds 2 (Some

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

IN THIS PAPER, we study the delay and capacity trade-offs

IN THIS PAPER, we study the delay and capacity trade-offs IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 15, NO. 5, OCTOBER 2007 981 Delay and Capacity Trade-Offs in Mobile Ad Hoc Networks: A Global Perspective Gaurav Sharma, Ravi Mazumdar, Fellow, IEEE, and Ness

More information

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

Load Balancing Techniques

Load Balancing Techniques Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques

More information

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2 Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.2 June 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22

More information

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

More information

Sequential Data Structures

Sequential Data Structures Sequential Data Structures In this lecture we introduce the basic data structures for storing sequences of objects. These data structures are based on arrays and linked lists, which you met in first year

More information