Load Balancing and Rebalancing on Web Based Environment. Yu Zhang


 Chastity Sibyl Neal
 1 years ago
 Views:
Transcription
1 Load Balancing and Rebalancing on Web Based Environment Yu Zhang This report is submitted as partial fulfilment of the requirements for the Honours Programme of the School of Computer Science and Software Engineering, The University of Western Australia, 2004
2 Abstract We investigate two variants of a load distribution problem that is associated with distributing loads of varying size on a multiserver webbased environment. Solving the classical Load Balancing Problem allows us to distribute static web components to multiple servers, so that the loads on the servers are as equally distributed as possible. A typical objective is to minimize the makespan, the load on the heaviest loaded server. In reality however, loads on servers are often dynamic. As the load of web components change over time, the Load Rebalancing Problem was introduced by S. Keshav of Ensim Corporation. To solve the Load Rebalancing Problem we try to redistribute the loads of web components, in a fixed number of steps as moving components across servers can be expensive, so that the load on the servers are as equally distributed as possible. Solving these two problems successfully would allow us to utilize resources better and achieve better performance. However these problems have been proven to be NPhard, thus generating the exact solutions in tractable amount of time becomes infeasible when the problems become large. We thus adopt four greedy approximation algorithms to solve these two problems in polynomial time, within constant guaranteed error ratio. We give implementations of the algorithms in the Java programming environment. We carry out experiments to show that the error bounds are valid on our implementation. We also performed various experiments to test the performance of the algorithms in practical situations. By analyzing our results carefully we identified weakness in some of the algorithms and proposed improvements. We conclude that these approximation algorithms do indeed run in polynomial time, they generate approximated results within the said error ratio on our test data sets, and they are valid tools to assist us to balance and rebalance loads on a multiserver webbased environment. Keywords: Approximation Algorithms, Load Balancing, Scheduling CR Categories: F.2.2, G.2.1, C.2.4 ii
3 Acknowledgements I would like to thank Ensim Corporation for posting the Load Rebalancing Problem, and my supervisor Gordon Royle for all his help and support. iii
4 Contents Abstract Acknowledgements ii iii 1 Introduction 1 2 Literature Review 3 3 Load Balancing Problem The Load Balancing Problem Algorithm GBalance Algorithm SBalance Implementations Test Methodology Test Result Analysis and Comparison Load Rebalancing Problem the Load Rebalancing Problem Algorithm GRebalance Algorithm SRebalance Implementations Test Methodology Test Result Analysis and Comparison Proposed Improvement Proposed Improvement for GRebalance Proposed Improvement for SRebalance iv
5 5 Conclusion 24 A Original Research Proposal 26 B Approximated Makespan Generated for the Load Balancing and Load Rebalancing Problem 28 B.1 Result Generated Using GBalance and SBalance B.2 Result Generated Using GRebalance and SRebalance C Java Code Used 33 C.1 Model of Web Component in Load Balancing and Load Rebalancing Problem C.2 Model of Server in the Load Balancing Problem C.3 Implmentation of GBalance Algorithm C.4 Implementations of the SBalance Algorithm C.5 Model of Server in the Load Rebalancing Problem C.6 Implementations of the GRebalance Algorithm C.7 Implementations of the SRebalance Algorithm C.8 Implmentations of the Random Generators for test cases v
6 CHAPTER 1 Introduction Over the last four years, the number of Internet users increased by 125%, reaching a population of 812, 931, 592 [1]. The growth in Asian countries is tremendous, for instance in China, the number of Internet users reached a record high of 87 million, according to CCNIC [2]. As a result of this phenomenon, more people are relying on the Internet for educational and recreational purposes. Web sites are a prevalent means in which people receive information and interact with others on the net. The size of web sites have thus grown significantly to accommodate the increasing needs. As the number of surfers, size of web pages as well as the number of web pages in web sites increases, using a single server to host all the web components is no longer sufficient, as the single server could very easily become overloaded and unable to accommodate to all requests in time. Thus distributing web components on multiple servers to allow faster delivery of information becomes a beneficial solution. The problem we are trying to solve then is how do we balance the web components across a number of servers as equally as possible. Firstly we would like to define the load of a web component, such as a static HTML document, a gif image, an avi clip, a macromedia flash file or in some cases an entire web site to be the total number of bytes that a server has to send to all surfers. It is the size of the file in bytes multiplied by the number of hits. We wish to discover methods to distribute the web components to a number of servers as equally as possible. A good way to achieve that is to minimize the load on the most loaded server. We are interested in two specific problems in this context, namely the Load Balancing Problem and the Load Rebalancing Problem. Load Balancing Problem states that given a number of servers and a list of loads, we seek to assign each load to a server, such that the loads are distributed as equally as possible. When a web site is first published, the Load Balancing Problem can be used to model the initial distribution of web components  given the size of web components and their estimated number of hits, we wish to try to find a way to optimally distribute the web components on a fixed number of servers such that the load on 1
7 the most loaded server is minimized. The second problem we wish to investigate is associated with load rebalancing. After certain amount of time, the loads of web components are very likely to vary. For instance the number of hits of a HTML document might increase, or the size of a thread in a forum might get larger. As a result, some server might become much more loaded than others. We would thus like to redistribute the web components by moving some of them from servers to servers. Because moving web components among servers could be expensive, we would like to minimize the number of moves. The Load Rebalancing Problem is thus given the servers and associated loads, how do we redistribute the loads such that the loads on servers are as equal as possible. By solving these problems, we are able to distribute load evenly among servers when a web site is first published, and readjust the loads when necessary. However, the Load Balancing Problem has been proved to be NPhard [3], Aggarwal et al. [4] has also shown that the Load Rebalancing Problem is NPhard. Thus we are unable to deliver an exact solution to either in a reasonable amount of time when the problem gets large. For instance, if we have 5000 web components and 10 servers that we could use, we have configurations to enumerate and compare for the Load Balancing Problem. For the Load Rebalancing Problem, if we restrict the number of moves to 12, then we would end up with approximately configurations. While solving these problems is not feasible under such a situation, approximation algorithms could possibly be implemented to achieve a reasonable estimate within a fixed error ratio in a reasonable amount of time. Motivated by the above, we adopted four greedy algorithms to help us address the problems. GBalance and SBalance [3] are adopted to approximate the Load Balancing Problem, and GRebalance and SRebalance [4] are adopted to approximate the Load Rebalancing Problem. To gain a better understanding of the algorithms and test the correctness of the error bound we implemented and tested them in the Java environment [9]. Our goal is to test how well the algorithms perform in practice, and whether the run time of these algorithms are realistic in practice. For each problem, we first state it formally in mathematical forms, present the listing of the algorithm, describe our implementation and testing procedure, and lastly, present and compare the performance of the algorithms. During the process of analyzing the results, we also learned the weakness of GRebalance and SRebalance, and proposed an improved version of these two algorithms. 2
8 CHAPTER 2 Literature Review Currently there are many web sites with huge volume of content. These web sites are adopting different methods to distribute loads on a number of servers to ensure faster server response time. Server response time is largely determined by the underlying hardware of the servers. The performance of many of these webs such as ebay.com, amazon.com and expedia.com is heavily dependant on how fast its servers respond to requests, as the interaction between users and these web sites are realtime in nature. Overly loaded server would jeopardize the performance of these web sites significantly. Many sources [10, 11, 12] have detailed introductions of the common techniques used in practice. Currently load balancing can be done through hardware or software based techniques. One technique, called DNS load balancing, involves maintaining identical copies of the site on physically separate servers. The DNS entry for the site is then set to return multiple IP addresses, each corresponding to the different copies of the site. The DNS server then returns a different IP address for each request it receives, cycling through the multiple IP addresses. This method gives a very basic implementation of load balancing. However, since DNS entries are cached by clients and other DNS servers, a client continues to use the same copy during a session. This can be a serious drawback, as heavy website users may get the particular IP address that is cached on their client or DNS server, while lessfrequent users get another. So, heavy users could experience a performance slowdown, even though the server s resources may be available in abundance. Another loadbalancing technique involves mapping the site name to a single IP address, which belongs to a machine that is set up to intercept HTTP requests and distribute them among multiple copies of the Web server. This can be done using both hardware and software. hardware solutions, even though expensive, are preferred for their stability. This method is preferred over the DNS approach, as better load balancing can be achieved. Also, these load balancers can see if a particular machine is down, and accordingly divert the traffic to another address dynamically. This is in contrast to the DNS method, where a client is stuck with the address of the dead machine, until it can request a new one. 3
9 Another technique, reverse proxying, involves setting up a reverse proxy, that receives requests from the clients, proxies them to the Web server and caches the response onto itself on its way back to the client. This means that the proxy server can provide static content from its cache itself, when the request is repeated. This in turn ensures that the server itself can focus its energies on delivering dynamic content. Dynamic content cannot generally be cached, as it is generated real time. Reverse proxying can be used in conjunction with the simple load balancing techniques discussed earlier  static and dynamic contents can be split across different servers and reverse proxying used for the static content Web server only.method1 All the above approaches requires duplication of contents. While under most circumstances, this is not a huge problem for corporations, situations can arise when one wish to use an alternative approach. For instance, a web hosting company would not wish to duplicate multiple copies of the web sites it is hosting on all servers, as it introduces excessive cost. Instead of making duplications, our approach balances and rebalances web components by treating every web component as unique. Only one copy of a components will be found on all servers. Other than above mentioned cost saving benefits, this approach also eliminates the need to implement extra hardware and software to ensure concurrency. We are able to find implementations of most of the algorithms we adopted. For instance, the GRebalance algorithm has been implemented by Linder and Shah [13] to rebalance loads in real life. We choose to construct our own implementations to gain a better understanding of them. This proved fruitful as we learned the weakness of the algorithms, and were able to propose improved versions of the GRebalance and SRebalance algorithm. 4
10 CHAPTER 3 Load Balancing Problem 3.1 The Load Balancing Problem Load Balancing Problem is defined as the following. We are given a set of m servers M 1,.., M m, and a set of n components; each component j has a load of t j. We seek to assign each component to one of the servers so that the loads placed on all servers are as balanced as possible. Mathematically, in any assignment of components to servers, we can let A(i) denote the set of components assigned to server M i ; then server M i needs to work for a total time of T i = j A(i) t j (3.1) We define this as the load on server M i. In distributing the load evenly we wish to minimize a quantity known as the makespan  the maximum load on any server, T = max i T i. This classical problem does not only handle traditional load balancing on a multijob multimachine situation, but could also help us to distribute web components to various servers. Solving this problem would allow us to distribute load across servers evenly, such that the load on the most heavily loaded server is minimized. This is useful when a web site is first published, when servers are upgraded or when major updates take place such that all the web components need to be reassigned. Two algorithms will be adopted for this purpose, namely GBalance and SBalance. Both algorithm runs in polynomial time, and generate approximations that are guaranteed to be within a constant factor of the optimal solution [3]. More Specifically, GBalance achieves a guaranteed ratio of 2, meaning the makespan of the approximated solution is at most twice that of the optimal solution, while SBalance achieves a better guaranteed ratio of
11 3.2 Algorithm GBalance The first algorithm is called GBalance. It is a simple algorithm that passes through the entire web component list in any arbitrary order. The web component being processed is assigned to the currently minimal loaded server. This process is repeated until there are no web components left to be assigned. Algorithm GBalance (input: Component List, number of servers; output: servers with new assignment) Start with no web components assigned T i = 0, A(i) = null for all servers FOR j = 1,..., n Let M i be a server that achieves the minimum min k T k Assign component j to server M i A(i) = A(i) j T i = T i + t j END FOR The GBalance algorithm achieves a constant error bound of 2, with a run time of O(n m). The mathematical proof of the error bound and run time can be found in Kleinberg and Tardos [3]. 3.3 Algorithm SBalance An improvement over the previous algorithm is made through a simple sorting routine. The improved algorithm is called SBalance, it still runs in polynomial time and it achieves a better error bound of 3 of the optimal solution. The 2 algorithm first sorts the list of web components in decreasing load order, then goes through the list of sorted web components, and assigns each web component to the currently least loaded server. The algorithm is described below. Algorithm SBalance (input: Component List, number of servers; output: servers with new assignment) Start with no web components assigned. Set T i = 0, A(i) = null for all servers. 6
12 Sort all components in descending order of load. Assuming t 1 t 2 t 3... t n. For j = 1,.., n Let M i be a server that achieves the minimum min k T k Assign component j to server M i Set A(i) = A(i) j Set T i = T i + t j End for The SBalance algorithm achieves a constant ratio of 3, meaning that the 2 load on heaviest loaded server is at most 150% that of the optimal value. The algorithm s run time depends primarily on the sorting procedure, as the sorting procedure takes a time that is in higher order than the balancing procedure. A merge sort, pivot sort or quick sort algorithm would give SBalance a running time of O(n log n). The mathematical proof of the error bound and run time can be found in Kleinberg and Tardos [3]. 3.4 Implementations We would like to implement the algorithms in a way such that give the size and (estimated) hits of a web component, we would obtain an approximated optimal solution to distribute these web components as even as possible across our web servers. We used Object Oriented Programming approach to model web components, servers as well as the balancers. We choose to use the Java programming language [9] to implement both algorithms. Appendix C.1 appendix A shows the data structures we used to model web components. The component is modeled using a single double value, representing the load that it will contribute to a server. Appendix C.2 shows the construct of servers and operations that they can perform in our load balancing process. Appendix C.3 is the heart of the GBalance algorithm, it is the balancer and does the balancing process. 7
13 Appendix C.4 is the heard of the SBalance algorithm, with sorting and balancing. The sorting algorithm we used is the modified merge sort algorithm implemented by Java [8]. Appendix C.8 is our random test case generator. The inputs of both GBalance and SBalance are m  number of servers to balance the load on, and S  the name of the input data file. The web components are represented by data type double, and are lineseparated in the data file. The output of the programs is a printed list of web components assigned to each server. An ArrayList is used to store web components read in from the data file. The web components in the data file is first read then inserted into the ArrayList. After that we sort the web components based on a modified merge sort algorithm for the SBalance algorithm. Each web component in the ArrayList is then distributed sequentially to the currently minimum loaded server. 3.5 Test Methodology We would also like to find out how well the algorithms perform in practical situations. Another aim is to see how variables such as the number of servers and the average number of web components on each server affect the performance of the algorithms. We use makespan to measure the performance of the algorithms; the smaller the makespan, the better the result. Ideally, we would like to have a large number of real life data with calculated optimal values to compare the result of our implementations with. However due to various limitations this was not practical. We then decided to construct our own test cases. Apparently if we want to test the accuracy of the algorithms we would first need to know the optimal value of the test cases we generated. This is difficult due to the NPhardness of the problems that we are trying to solve. Our first attempt was to enumerate all possible cases of assigning n web components to m servers, and pick the assignment in which the heaviest loaded server has the least load. We started by assigning one web component to each server at random. This is because for all optimal assignment each server needs to have at least one web component given n m. For the remaining web components we partition them into m parts. This is done by numbering the remaining n m components in sequence, and using a partition function to partition integer 1 to n m to m parts. We then look up the corresponding component for each integer and replace the integer with the component. For each partition assignment, we record the maximum load on any server. We then attempted to identify the partition with the least maximum load and thus the optimal value. This attempt was not successful due 8
14 to the extremely long computation time owing to the number of partitions, and program simply crashes. Our second attempt was to first generate an optimal case, and work backwards to load all servers with loads that sum to this optimal value. We then proceed to scramble the arrangement and let our program assign the web components. Doing so guarantees the generation of an optimal solution, that is the load on any server. We then work backwards to figure out what web components are on each server. This is achieved by generating a series of random numbers from 0 to the optimal load, and taking the difference between successive numbers. For instance, if we decided the optimal load is 10, and the random numbers representing slices are 2,7,3,4, we first add the number 0 and 10 to it and sort them in descending order. The series now becomes 10,7,4,3,2,0. We then take the difference of the successive numbers to obtain the size of the web components. In this particular case, we have 10 7 = 3, 7 4 = 3, 4 3 = 1, 3 2 = 1 and 2 0 = 2.The size of the web components assigned to that servers is then 3,3,1,1,2. We then remove all web components from all servers and put them in the data file in an arbitrary manner. The test file is then read in by our programs, and the loads on the heaviest loaded servers are recorded and compared with the optimal solution that we have pregenerated. We present the results obtained in the following section. 3.6 Test Result We would first like to introduce the parameters we used to obtain the result. For each test, we pregenerate O, the optimal value. We set it to 100 for all the test cases used for simplicity reasons. m is the number of servers, and n cap, the maximum number of web components a server can have during the generation of the optimal result. The reason why we introduced n cap is that we wish to introduce some large web components into servers deliberately to test the correctness of the algorithms, and examine how they perform when loads are not well balanced. n cap essentially controls how close the web components sizes are. For instance, when n cap is small, the chance of having web components with large load is high, and vice versa. For each test run, we run 200 numbers of tests using the same O, m and n cap. We record T, the approximated solution generated by GBalance and SBalance. We then compute A R, the average value of load on the heaviest loaded server and M R the maximum value of load on the heaviest loaded server obtained using the same m and n cap for 200 tests. Lastly we compute the average error ratio A E, and maximum error ratio M E, using separate sets of tests. We start our test by setting m to 4 and n cap to 4. We then increase the number of m and n cap and repeat the experiment. The results are presented in 9
15 Figure 3.1: Average Output Obtained Using m = 64 the tables in Appendix B.1. Figure 1 is a plot of the makespan generated using these two algorithms against log n cap, obtained using m = Analysis and Comparison Our implementations of both algorithms achieved the said ratio on all our test 3 cases. The error bound of 2 is valid on GBalance and the error bound of is 2 valid on SBalance. 10
16 SBalance takes longer time to run in practice, however, even with the largest set of data in our test cases (256 servers, 256 slices), the practical run time difference between the two algorithm is not very significant, as SBalance on average only takes about 1 to 2 seconds more to execute. Theoretically, GBalance could produce better results in certain cases, however such situations seem rare as we did not observe such behavior in all tests. In our test cases, the results produced by SBalance is constantly superior to the result produced by GBalance. Initially we thought that as the number of slices go up, the web component size will become smaller and more even, thus helping GBalance to obtain better results. However, this is not the case when the results are analyzed. The effect of number of slices as well as the number of servers we use does not seem to affect the performance of SBalance significantly. SBalance constantly produced near perfect result, while GBalance s performance is crippled by the presence of any large web components. Our result showed that as n cap or m goes up, the total number of web components increases. The chance of getting a large web component from these set ups also increases. As makespan is a single measurement of the loads on the heaviest loaded server, as a result of that, we found that GBalance performance deteriorates as m and n cap increases. Figure 1 is an illustration of this claim. The result of both algorithms tend to be better when m << n cap. 11
17 CHAPTER 4 Load Rebalancing Problem 4.1 the Load Rebalancing Problem Although the greedy algorithms introduced in the previous section help us to balance static loads on multiple servers, the assumptions we made were too simple for realistic situations faced in real life. There are several important issues not addressed in the previous model, and we would like to discuss two of them. First of all, the loads on servers are not static but generally dynamic. There are several instances in which the load might change: the size of the web component might change over time, for example, a dynamic page in a web forum is likely to increase in size; the component might get more or less hits over time, contributing to the change in load; lastly, a new component could be created or destroyed, for instance, a user might upload/delete a file stored on a server. As the load on servers change over time, the load becomes unbalanced again, some servers might become overloaded again, and we are faced with the problem of not utilizing resources efficiently. Secondly, moving web components across servers can be an expensive procedure. For example, removing a HTML page and host it on another server would require downtime for server maintenance, changing HTML links on other pages with reference to it or changing the address of the component in mapping. The Load Rebalancing Problem was introduced by S. Keshav of Ensim Corporation [5], precisely because cooperation are facing the problem of changing load on servers over time. We borrow the concept of makespan to help us visualize the problem better. The makespan again is the load on the heaviest loaded server. The problem we try to solve is the following: given the loads on different servers, what s the minimal makespan that can be achieved with at most k moves? A more formal definition is the following. Given an assignment of the n web components to m servers, and a positive integer k, relocate no more than k web components so as to minimize the maximum load on a server. Again, we would like to define the load of a web component to be its size multi 12
18 plied by its number of hits, and the load of a server to be the total amount of load from the web components it hosts. If we are able to approximate this problem, then we would be able to give an optimal move strategy to redistribute load onto servers more evenly, and achieves a faster overall delivery time. Two algorithms will be implemented for this purpose, namely GRebalance and SRebalance. Both algorithm runs in polynomial time, GRebalance achieves a guaranteed ratio of 2 1 m, while SRebalance achieves a better guaranteed ratio of Algorithm GRebalance This simple algorithm is very similar to the SBalance algorithm. It is a simple variant of Graham s greedy heuristic [7] and yields a 2approximation, as described by Shmoys and Tardos. [6] Algorithm GRebalance (input k  number of moves, P  list of servers with loads, output: BP  list of servers with loads) For n in 1 to k Remove the largest single web component from the currently most loaded server. (Step 1) End for For n in 1 to k Placed the k removed web components onto the currently minimumloaded server (Step 2) End for Output BP The sorting of web components according to decreasing size of load takes O(n log n) time, to reinsert the removed web components it takes O(k log m) time. Since we are interested in nontrivial case in which m n, we have a total running time of O(n log n). The mathematical proof of the error bound and run time can be found in Aggarwal et al. [4]. 4.3 Algorithm SRebalance This algorithm is originally presented by Aggarwal et al. [4], and takes a more complicated approach as compared with the GRebalance algorithm. To formalize 13
19 it better, we begin by presenting some definitions used by its original authors. Definition 1. Web components of size strictly greater than 1 OP T are called large, the rest 2 are called small. Let L t denote the total number of large web components. m l is the number of servers with at least one large web components; then L e = L t m l denotes the number of extra large web components on this set of servers. A server is large free if it doesn t have a large web component assigned to it currently. We first look at an algorithm that does the rebalancing with given optimal value. This algorithm is called Partition, has an error bound of 1.5,and makes at most the same number of moves needed by the optimal algorithm. At this stage it does not enforce the number of moves restriction. Later on we will describe a method to do away with having to input an optimal value, and enforcing the number of moves restriction. Algorithm Partition 1. From each of the m L servers which has a large web component, remove all large web components, except for the smallest sized large web component therein. 2. Calculate for each server i, the following values with respect to their current configuration a i : the minimum number of small web components to be removed so that the total size of the remaining small web components is at most 1 2 OP T b i : the minimum number of web components (including any large web components) to be removed so that the total size of the remaining web components (including any large web components) is at most OPT. c i = a i b i 3. Select the L T servers within the smallest values of c i, breaking ties by giving preference to the servers containing large web components. Remove the a i small web components from the selected servers, thereby ensuring that the total size of the remaining small web components on these servers is at most 1 2 OP T. 4. From the remaining m L T servers, remove the b i web components from them. Large web components, if any needs to be reassigned. Assign each of the removed large web components (arbitrarily) to distinct largefree servers created in step 3. 14
20 5. Arbitrarily assign the large web components removed in step 1 to the remaining largefree servers 6. For the small web components removed in 3 and 4, assign them onebyone to the current minimumload server. To do away without the optimal value, one key observation is that only when OP T cross some threshold value, would it affect L T, a i or b i. For instance, only when the value of 1 2 OP T crosses some web component s load p j, does the value of L T changes. Similarly, we can obtain the threshold values of a i and b i. The set of threshold of a i, b i over all servers combined with 2p j of all web components gives the threshold value of OP T. Given that, it is sufficient to implement SRebalance. LEMMA 1. Enumerating in increasing order of all threshold values for each server i, with respect to L T, a i and b i, then L T, a i and b i remain unchanged for OP T varying between 2 consecutive threshold values. Algorithm SRebalance 1. Use the average load as the starting guess for OP T 2. Calculate the corresponding L T, L E, a i, b i, c i values using Partition. Let k b be the total number of moves needed by this algorithm. 3. While k b > k do Increase the guessed value of OP T over to the next threshold value Recalculate corresponding values for L T, L E, a i,b i, c i End While 4. Return the result produced by the last execution of Partition. The error bound for this algorithm is 3, and the run time is O(n log n). The 2 mathematical proof of the error bound and run time can be found in Aggarwal et al. [4]. 4.4 Implementations We again choose to program under the Java Programming Language [9] as we would like to model web components and servers using Object Oriented Programming approach. We would like to implement the algorithms in a way such that give the list of servers with the current size and hits of web component it is hosting, we would 15
Chord: A Scalable Peertopeer Lookup Service for Internet Applications
Chord: A Scalable Peertopeer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science chord@lcs.mit.edu
More informationLoad Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
More informationRobust Set Reconciliation
Robust Set Reconciliation Di Chen 1 Christian Konrad 2 Ke Yi 1 Wei Yu 3 Qin Zhang 4 1 Hong Kong University of Science and Technology, Hong Kong, China 2 Reykjavik University, Reykjavik, Iceland 3 Aarhus
More informationMining Data Streams. Chapter 4. 4.1 The Stream Data Model
Chapter 4 Mining Data Streams Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make
More informationEfficient Algorithms for Sorting and Synchronization Andrew Tridgell
Efficient Algorithms for Sorting and Synchronization Andrew Tridgell A thesis submitted for the degree of Doctor of Philosophy at The Australian National University February 1999 Except where otherwise
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationOptimal Positioning of Active and Passive Monitoring Devices
Optimal Positioning of Active and Passive Monitoring Devices Claude Chaudet Claude.Chaudet@enst.fr GET/ENST LTCIUMR 5141 CNRS 46, rue Barrault 75634 Paris, France Eric Fleury, Isabelle Guérin Lassous
More informationCuckoo Filter: Practically Better Than Bloom
Cuckoo Filter: Practically Better Than Bloom Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Carnegie Mellon University, Intel Labs, Harvard University {binfan,dga}@cs.cmu.edu, michael.e.kaminsky@intel.com,
More informationSizeConstrained Weighted Set Cover
SizeConstrained Weighted Set Cover Lukasz Golab 1, Flip Korn 2, Feng Li 3, arna Saha 4 and Divesh Srivastava 5 1 University of Waterloo, Canada, lgolab@uwaterloo.ca 2 Google Research, flip@google.com
More informationAre Automated Debugging Techniques Actually Helping Programmers?
Are Automated Debugging Techniques Actually Helping Programmers? Chris Parnin and Alessandro Orso Georgia Institute of Technology College of Computing {chris.parnin orso}@gatech.edu ABSTRACT Debugging
More informationJazz Performance Monitoring Guide
Jazz Performance Monitoring Guide Author: Daniel Toczala, Jazz Jumpstart Manager The goal of this performance monitoring guide is to provide the administrators and architects responsible for the implementation
More informationAbstract. 1. Introduction. Butler W. Lampson Xerox Palo Alto Research Center David D. Redell Xerox Business Systems
Experience with Processes and Monitors in Mesa 1 Abstract Butler W. Lampson Xerox Palo Alto Research Center David D. Redell Xerox Business Systems The use of monitors for describing concurrency has been
More informationBuilding A Better Network Monitoring System
Building A Better Network Monitoring System A report submitted in fulfillment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours at The University of Waikato
More informationA Survey and Analysis of Solutions to the. Oblivious Memory Access Problem. Erin Elizabeth Chapman
A Survey and Analysis of Solutions to the Oblivious Memory Access Problem by Erin Elizabeth Chapman A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in
More informationA GRASPKNAPSACK HYBRID FOR A NURSESCHEDULING PROBLEM MELISSA D. GOODMAN 1, KATHRYN A. DOWSLAND 1,2,3 AND JONATHAN M. THOMPSON 1*
A GRASPKNAPSACK HYBRID FOR A NURSESCHEDULING PROBLEM MELISSA D. GOODMAN 1, KATHRYN A. DOWSLAND 1,2,3 AND JONATHAN M. THOMPSON 1* 1 School of Mathematics, Cardiff University, Cardiff, UK 2 Gower Optimal
More informationThe Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems
The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems MATTHEW K. WRIGHT, MICAH ADLER, and BRIAN NEIL LEVINE University of Massachusetts Amherst and CLAY SHIELDS Georgetown
More informationACTIVE NETWORKS: APPLICATIONS, SECURITY, SAFETY, AND ARCHITECTURES
IEEE COMMUNICATIONS SURVEYS ACTIVE NETWORKS: APPLICATIONS, SECURITY, SAFETY, AND ARCHITECTURES KONSTANTINOS PSOUNIS STANFORD UNIVERSITY ABSTRACT Active networks represent a new approach to network architecture.
More informationFEW would argue that one of TCP s strengths lies in its
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 13, NO. 8, OCTOBER 1995 1465 TCP Vegas: End to End Congestion Avoidance on a Global Internet Lawrence S. Brakmo, Student Member, IEEE, and Larry L.
More informationWhere s the FEEB? The Effectiveness of Instruction Set Randomization
Where s the FEEB? The Effectiveness of Instruction Set Randomization Ana Nora Sovarel David Evans Nathanael Paul University of Virginia, Department of Computer Science http://www.cs.virginia.edu/feeb Abstract
More informationUNIVERSITY OF OSLO Department of Informatics. Performance Measurement of Web Services Linux Virtual Server. Muhammad Ashfaq Oslo University College
UNIVERSITY OF OSLO Department of Informatics Performance Measurement of Web Services Linux Virtual Server Muhammad Ashfaq Oslo University College May 19, 2009 Performance Measurement of Web Services Linux
More informationDealing with Uncertainty in Operational Transport Planning
Dealing with Uncertainty in Operational Transport Planning Jonne Zutt, Arjan van Gemund, Mathijs de Weerdt, and Cees Witteveen Abstract An important problem in transportation is how to ensure efficient
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationA Comparison between Agile and Traditional. Software Development Methodologies
A Comparison between Agile and Traditional Software Development Methodologies M. A. Awad This report is submitted as partial fulfilment of the requirements for the Honours Programme of the School of Computer
More informationAutomatic Performance Diagnosis and Tuning in Oracle
Automatic Performance Diagnosis and Tuning in Oracle Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, Graham Wood Oracle Corporation 500 Oracle Parkway Redwood Shores, CA 94065, USA {kdias,
More informationOn the selection of management/monitoring nodes in highly dynamic networks
1 On the selection of management/monitoring nodes in highly dynamic networks Richard G. Clegg, Stuart Clayman, George Pavlou, Lefteris Mamatas and Alex Galis Department of Electronic Engineering, University
More informationPractical RiskBased Testing
Practical RiskBased Testing Product RISk MAnagement: the PRISMA method Drs. Erik P.W.M. van Veenendaal CISA Improve Quality Services BV, The Netherlands www.improveqs.nl May, 2009 2009, Improve Quality
More informationStreaming Similarity Search over one Billion Tweets using Parallel LocalitySensitive Hashing
Streaming Similarity Search over one Billion Tweets using Parallel LocalitySensitive Hashing Narayanan Sundaram, Aizana Turmukhametova, Nadathur Satish, Todd Mostak, Piotr Indyk, Samuel Madden and Pradeep
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationA survey of pointbased POMDP solvers
DOI 10.1007/s1045801292002 A survey of pointbased POMDP solvers Guy Shani Joelle Pineau Robert Kaplow The Author(s) 2012 Abstract The past decade has seen a significant breakthrough in research on
More informationAnalysis, Design and Implementation of a Helpdesk Management System
Analysis, Design and Implementation of a Helpdesk Management System Mark Knight Information Systems (Industry) Session 2004/2005 The candidate confirms that the work submitted is their own and the appropriate
More information