Automatic Exploration of Datacenter Performance Regimes
|
|
|
- Leslie Melton
- 9 years ago
- Views:
Transcription
1 Automatic Exploration of Datacenter Performance Regimes Peter Bodík, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan, David A. Patterson RAD Lab, EECS Department UC Berkeley Berkeley, CA ABSTRACT Horizontally scalable Internet services present an opportunity to use automatic resource allocation strategies for system management in the datacenter. In most of the previous work, a controller employs a performance model of the system to make decisions about the optimal allocation of resources. However, these models are usually trained offline or on a small-scale deployment and will not accurately capture the performance of the controlled application. To achieve accurate control of the web application, the models need to be trained directly on the production system and adapted to changes in workload and performance of the application. In this paper we propose to train the performance model using an exploration policy that quickly collects data from different performance regimes of the application. The goal of our approach for managing the exploration process is to strike a balance between not violating the performance SLAs and the need to collect sufficient data to train an accurate performance model, which requires pushing the system close to its capacity. We show that by using our exploration policy, we can train a performance model of a Web 2.0 application in less than an hour and then immediately use the model in a resource allocation controller. Categories and Subject Descriptors C.4 [Computer Systems Organization]: Performance of Systems; C.5.5 [Computer System Implementation]: Servers General Terms Experimentation, Measurement, Performance Keywords Automatic control, resource allocation 1. INTRODUCTION Horizontally scalable Internet applications would appear to be a great fit for automatic control, and indeed a growing literature exists on applying models of application performance for resource Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACDC 09, June 19, 2009, Barcelona, Spain. Copyright 2009 ACM /09/06...$5.00. allocation [5, 6, 13, 14, 7, 11, 3]. Much of this work falls under the framework of model-based control, in which first a performance model is trained to map from the workload of the system and the current number of servers to the expected application performance, and then during production, the model is used to select the optimal number of servers to meet the current workload, while meeting the desired application performance. However, in previous work on model-based control, performance models are trained offline, using data from performance of the application on a small-scale benchmark test bed, from historical performance data, or from application performance under low workload. Offline training of performance models creates serious difficulties in applying model-based control in real-world settings, for two reasons. First, operators of data center applications report that experiments on small-scale test beds do not reflect the capacity of applications in production [1]. Second, the performance profile of Internet applications changes frequently, because of changes in how the application is used, changes in the datacenter environment, and changes in the application itself. Both of these difficulties can be circumvented by training the model online, that is, continually retraining the model based on the latest workload and performance data from the production system. A difficulty of applying online training is the need to gather data from different regimes of workload and system configuration while avoiding violating the Service Level Agreements (SLAs) of the application. Building an accurate performance model can require data from several different regimes of application performance, for example performance under low workload with few machines, under high workload with many machines, and so on. In this paper, we propose a solution to this data-gathering problem, namely, the use of an exploration policy. An exploration policy is an automatic control strategy that is specifically designed to collect a broad range of training data for a more complex model-based control strategy, that is, to explore different performance regimes. To minimize the probability of violating the SLA, the exploration policy constantly monitors the performance of the application and immediately adds more servers if the latency exceeds the specified exploration safety threshold. This idea of performing experiments in the production environment is not as radical a departure from current practice as it may seem: it is in fact a common practice to measure system capacity in production, by increasing load on system components in a controlled way [1, 15]. An exploration policy must meet four requirements. First, it must quickly explore different regimes of the application. Second, the amount and type of data collected during exploration must be sufficient to train an accurate performance model. Third, even during the exploration phase, the policy must be careful not to violate performance requirements. And finally, the exploration policy must
2 include rules for automatically switching to model-based control once the performance model is accurate. We have found that an exploration policy that meets these requirements must satisfy the following four principles. First, the exploration policy must push the system close to its capacity, because otherwise we cannot reliably estimate that capacity. However, as a second principle, exploration must not push the system past its capacity, because the performance requirements for interactive datacenter applications are strict. Operators will not tolerate an automatic control system that violates performance requirements. The trade off between collecting data close to the system capacity and violating the performance SLA is controlled by the exploration safety threshold. Third, during exploration, the actions themselves should run as quickly as possible. In our application, we achieve this using a pool of hot standby machines that are fully ready to serve requests, but are not yet provisioned. Adding these machines to the application is much faster than requesting and initializing new machines. This makes exploration both safer, because the standby servers can be added quickly in an emergency, and faster, because the exploration controller can see the effects of its actions without a long delay. Finally, the exploration policy needs to quickly estimate an approximate capacity model of the application to prevent entering configurations where the system cannot handle the incoming workload. Without such a model, the policy often removes too many servers and causes an SLA violation. 2. RELATED WORK 2.1 Offline exploration Training of statistical performance models using exploration of the configuration space was previously described in [9, 8, 2]. The authors use active sampling to select system configurations to benchmark which leads to fast exploration of most performance regimes of the system. The number of system parameters and their values is often large and exhaustive benchmarking of all possible combinations would take weeks or months. The goal of active sampling is to select a small set of experiments that will provide sufficient data to train an accurate performance model. The benchmarking of the system is performed offline, not in a production environment; selecting a configuration that results in high latency or low throughput of the system thus does not impact the experience of users of the system. In this paper we present exploration that is performed in a production environment where using an incorrect configuration has potentially catastrophic consequences. 2.2 Exploration in Reinforcement Learning Reinforcement learning algorithms are designed to automatically explore the environment, estimate a control model, and then slowly switch to the exploitation mode where this control model is used to achieve a certain objective without further exploration [12]. However, the standard exploration algorithms simply perform random actions and do not consider their potentially high cost. Applying these algorithms to our problem would certainly cause too many SLA violations. 3. MODEL-BASED RESOURCE ALLOCA- TION We use the following framework for model-based control of horizontally scalable Internet applications with frequently changing performance characteristics consisting of three components. We first train a statistical performance model to capture the relationship between workload, number of servers, and the request latency. We propose to train the model in the production environment using exploration and later frequently retrain it to adapt to gradual changes in performance. We use performance models based on smoothing splines or local regression [16] established techniques for nonlinear regression that do not require specifying the shape of the curve in advance. The model estimates the fraction of requests slower than the SLA threshold, given input of the form { workload, } observed over a period of twenty seconds. The performance model is then used in the optimal, model-based controller. The proposed controller first predicts the next five minutes of workload using a linear regression on the most recent 15 minutes. The predicted workload is then used as input to the performance model to find the smallest number of machines that can handle the workload. Finally, the controller employs a hysteresis strategy to avoid oscillations. To detect abrupt changes in relationship between workload, number of machines, and performance, we use change point detection a technique based on statistical hypothesis testing. The change point detection algorithm compares the current performance of the application relative to an accurate model of the performance a few hours ago. After detecting a change point, the controller enters the exploration phase, quickly trains a new model, and eventually returns to the optimal, model-based control. 4. EXPLORATION CONTROL POLICY The goal of the exploration control policy is to quickly collect enough data to train an accurate and stable performance model while avoiding the SLA violations. The exploration policy has to cope with an inherent trade off between the safety of the exploration and the quality of data being collected. If the policy does not push the system close to its capacity, it will not be able to train an accurate performance model, while running the system above the capacity will likely result in SLA violations. 4.1 Baseline Exploration To quickly explore a wide range of the application behaviors, the exploration policy sets a random request latency target between 0 and M, the exploration safety threshold, and adds or removes machines to reach that target. If the current latency is lower than the selected latency target, the policy removes a single machine from the application, otherwise it adds a machine. The policy sets a new random latency target after adding or removing two machines. The effect of such policy is that it explores both the low-latency and high-latency regimes of the system. To quickly respond to possible future SLA violations, when latency increases above the safety threshold, the policy immediately requests another server for the application. The safety target thus serves two purposes: it ensures that the policy never selects a very high latency target and is also used to trigger a safety action in case of emergency. To collect enough workload and performance data for a given configuration, after removing a machine from the application, the policy waits for D minutes before executing another action. After adding a machine, the policy waits for D + minutes, which includes the average delay of requesting and initializing a new machine. 4.2 Safety Mechanisms While experimenting with this exploration policy, we found that it tends to violate the performance SLA and we have thus implemented two safety mechanisms. We maintain a pool of hot standby machines that can be added to the application immediately without waiting for new machines to boot up and we quickly estimate an ap-
3 % slow requests fraction > 1sec (20-sec period) fraction > 1sec (10-minute period) SLA threshold (5%) time [hours] number of machines throughput # machines :00 02:00 03:00 04:00 05:00 06:00 07:00 time [hours] Figure 1: A typical six hour experiment with parameters M = 5% and λ = 2%. In the top graph, the thin gray line represents the fraction of requests slower than 1 second measured every 20 seconds. The thick black line represents the fraction of slow requests over a 10-minute period that corresponds to our SLA; a SLA violation occurs when the black line crosses the dashed horizontal line. There were a total of two SLA violations in this experiment. The symbols in the center differentiate between exploration periods (circles on the top) and optimal control (triangles on the bottom). During period with no symbol, the controller was waiting for the previous action to complete. In the bottom graph, the thin line shows the throughput of the application, the thick line represents the number of running application servers. Notice that during exploration the number of servers changes very rapidly, while during optimal control, the hysteresis smoothes out oscillations of the controller. proximate capacity of a single machine which prevents the policy from removing too many machines. Finally, we improve the stability of the performance model by discarding data collected during the transient behavior of the application after adding or removing a server. Machines in hot standby Booting up a new machine and adding it to the application takes several minutes, which makes the exploration policy very slow to respond to sudden increases in workload. We therefore maintain a small number of hot standby machines that are running the application, but are not serving any requests. When the exploration policy requests a new machine, a hot standby machine is added to the configuration of the load balancer and starts serving application requests within seconds. After adding a machine to the application from the hot standby pool, the exploration policy requests a new hot standby machine to ensure a certain capacity of the pool. Estimating server throughput using a linear model The baseline exploration policy often removes a server even when the application is very close to its capacity which causes a significant increase in request latency. To avoid this behavior, we estimate the maximum throughput of a single machine and then derive the minimum number of machines required for the current workload. Let w t and n t be the workload and number of servers at time t, respectively, and let S be the set of time intervals when the latency exceeded the threshold defined in the SLA. The set T = {w t/n t} t S thus represents the workload per server during time intervals when the latency was too high. We approximate the maximum throughput of a single server, w max, as the median of values in T. At time t, the exploration policy will remove a server only if there are more than w t/w max servers running. We could obtain a more conservative estimate of w max by computing a lower quantile of values in T, such 10 th or 25 th. Using the minimum of T should be avoided because this statistic is very sensitive to outliers, while quantiles are more robust. This throughput estimate is also used to compute the number of machines to add as a safety action after detecting a potential SLA violation. Assuming that a machine serving w max requests per second is at 100% utilization, the safety action aims to decrease the utilization to 75%, or more formally, the safety action adds w t/w max/0.75 n t machines. Discarding data from transients During initial testing we observed significant spikes in latency when adding machines to the application. We believe that the increase in latency is due to the new machines being cold and thus responding slower than the remaining machines. Because the performance model should model the steady-state performance of the application, we discard data observed within one minute after adding or removing a machine. If used for training, these data points could negatively impact the accuracy of the performance model. The duration of the these transients could be estimated either offline by changing the number of machines under steady workload and analyzing the spikes in latency, or online by comparing the observed workload to the prediction of an accurate model.
4 5. SWITCHING TO OPTIMAL CONTROL As the accuracy of the performance model increases during exploration, the controller has to decide when to switch from the exploration policy to the optimal, model-based control. The controller makes this decision at each time step by evaluating the stability of the performance model at the current workload. We first use bootstrapping [16] to estimate the variance of the performance model M at different values of workload w and number of servers n and then use it to decide whether to continue exploring or switch to optimal control. Bootstrapping is a technique for estimating properties of a model (such as its variance) by fitting the same model to datasets obtained by resampling the original training dataset. The original dataset D is used to create k resampled datasets D 1,..., D k ; each D i is the same size as D and contains data points sampled from D with replacement (some data points are selected multiple times, while others are not selected). The variance of the performance model M for workload w and number of servers n is estimated as the variance of predictions of the k models obtained from the respective resampled datasets at point (w, n). M is considered stable at (w, n) if the standard deviation (square root of the variance) is less than the model stability threshold λ. See Figure 2 for the evolution of model stability during exploration. We use the following rule to decide whether to explore at the current workload w now. Let n 0 be the smallest n such that the model is stable at (w now, n 0) and M(w now, n 0) is less than the performance threshold specified in the SLA; in other words, we expect n 0 servers to handle the current workload with good latency. However, to ensure we can rely on predictions of the model for n 0 servers, we also check the stability of the model at points (w now, n 0) through (w max, n 0), where w max is the maximum workload that n 0 servers can handle according to the current model. If the model is not stable at these points, the controller remains in exploration. Otherwise it switches to optimal control and will allocate n 0 servers to the application. 6. EVALUATION The experiments in this section illustrate the four principles described in the introduction. First, they show that using both a hot standby pool of servers and the simple capacity model is necessary for a exploration policy to avoid SLA violations and quickly train an accurate performance model (see Section 6.1). Second, without pushing the application close to its capacity, the exploration policy does not observe enough data in the high-latency regime of the application and is unable to train an accurate performance model. This situation occurs when the exploration safety threshold M is set to a very low value. Also, setting the safety threshold too high causes a significant number of SLA violations and would thus make the exploration policy unacceptable for production environments (see Section 6.2). Finally, we demonstrate the effects of the model stability threshold λ on the accuracy of the model and the duration of the exploration (see Section 6.3). Experimental Setup We evaluate the exploration policy on Amazon EC2 using Cloud- Stone [10], a recently proposed benchmark for Web 2.0 applications. We use a 36 hour workload compressed into six hours obtained from Ebates.com [4]. We assume a cloud computing environment where running a single virtual machine (VM) for ten minutes costs $0.10. We use the local regression library locfit in the statistical package R to estimates the fraction of requests slower than one second given the current workload and number of Figure 2: The stability of the performance model after 27, 31, and 64 minutes and at the end of the six hour experiment. In each plot, a dot at workload w and number of servers n means that the performance model is stable at (w, n); i.e. the variance is less than λ. The thick black line represents the number of machines used during optimal control. For workloads where the optimal number of machines is not available, the controller remains in exploration. servers. During bootstrapping, we found that using k = 10 bootstrap samples are sufficient to estimate the stability of the performance model. We used a hot standby pool of two machines. Before executing another action, the baseline exploration policy waits for D + = 7 minutes after adding a machine and D = 3 minutes after removing a machine. Requesting and initializing a new machine takes approximately four minutes which are included in D +. With hot-standby machines, both D + and D are set to 3 minutes. We define two performance SLAs: at most 5% of requests should be slower than one second during periods of ten and two minutes. The two-minute SLA is a more stringent criterion, because it is affected even by brief spikes in latency. A good exploration policy would minimize the number of SLA violations over the ten and two minute periods (SLA10 and SLA2 in Tables 1, 2, and 3), the length of the exploration period in minutes (explor.), and the total cost of the VMs during the experiment in dollars (VM cost). We performed three runs for each setting of the parameters and report the average over these three runs. 6.1 Effects of the safety mechanisms We first evaluate the baseline exploration policy and compare the effects of the safety mechanisms. We performed four experiments; starting with the baseline exploration policy, then adding hot standby servers, adding the simple capacity model, and finally adding the discarding of data collected during transient behavior. We used exploration safety threshold M = 0.05 and model stability threshold λ = The results are summarized in Table 1. The baseline exploration policy performed worst, regularly violating the SLA and taking too long to converge to an accurate model. Adding hot standby servers decreased the number of SLA violations and the exploration time, while using the simple capacity model reduced the number of SLA violations. As expected, discarding data points from transients helped the model to train faster.
5 mean type SLA10 SLA2 explor. VM cost baseline hot standby capacity est discard tran standard deviation type SLA10 SLA2 explor. VM cost baseline hot standby capacity est discard tran Table 1: Comparison of the baseline and exploration policies created by adding the safety measures. The last policy uses hot standby machines, simple capacity estimate, and discards data points during transients. The top table shows the mean over the three experiments, the bottom one shows the standard deviation. The columns, from left to right, represent the type of exploration algorithm, number of SLA violations when averaging over ten and two minutes, duration of the exploration phase in minutes, and total cost of the VMs. 6.2 Effects of the exploration safety threshold To understand the effect of the exploration safety threshold M on the behavior of the exploration policy we performed a series of experiments with values of M ranging from to 0.10 (see Table 2). We used λ = Small values of M force the exploration policy to add too many servers to achieve the low latency specified by the performance target and thus increase the VM cost. More importantly, the exploration takes too long because the policy never observes the highlatency regime of the application. The policy with M = 0.01 performed much better, however, in one of the runs the simple capacity model was very inaccurate because of only a single instance of latency exceeding the SLA threshold. As a consequence, the policy twice added 15 machines as a safety action when a few machines would suffice. Increasing the value of M allows the policy to push the application closer to its capacity and train the performance model faster. The number of SLA10 violations is very small, while the number of SLA2 violations doesn t increase much even for higher values of M. Values of the safety threshold above the 5% SLA threshold do not automatically mean that the policy will constantly violate the SLA. The SLA is usually evaluated on longer time periods, while the exploration policy can quickly respond to spikes in latency that occur at higher values of M. We found that the SLA is significantly violated only at M = Effects of the model stability threshold λ The model stability threshold λ affects the length of the exploration and the accuracy of the performance model. We performed experiments with values of λ ranging from to 0.5; the results are summarized in Table 3. Smaller values of λ pose a strict requirement on model stability and require significantly more data points to train the model and thus extend the exploration. Consequently, longer exploration causes more SLA violations and higher VM cost. As we loosen the restrictions on the stability of the performance model, the exploration period becomes much shorter while also desafety thr. M SLA10 SLA2 explor. VM cost Table 2: Comparison of exploration policies with different values of the safety threshold M (λ = 0.05). λ SLA10 SLA2 explor. VM cost Table 3: Comparison of exploration policies with different values of the model stability λ (M = 0.05). creasing the number of SLA violations. Somewhat surprisingly, even when λ the threshold on the standard deviation of the predictions of the model significantly exceeds the SLA threshold of 5%, the optimal control using this model is still accurate and doesn t incur any additional VM costs nor does it cause more SLA violations. 6.4 Setting the parameter values In practice the values of the safety and model stability thresholds could be adjusted iteratively. One would start with low values of M = and λ = 0.02 and then slowly increase both to shorten the duration of the exploration period. In case of a much stricter SLA defined over a shorter time interval, the value of the safety threshold has to be set much lower. 6.5 Exploration at large scale The exploration policy has two parameters that would have to be adjusted to make it work at scale of thousands of machines. First, the current policy adds or removes a single machine every three minutes to reach the specified performance target. On a large scale, such behavior would be too slow and the policy should instead add or remove a fraction of the current servers (such as 5%). Similarly, the size of the hot standby pool should be a fraction of the current number of servers, such as 10 or 20%. 7. CONCLUSION In this paper we argue that training a performance model in a production environment is a necessary first step in automatic resource allocation. We present a safe exploration policy that quickly explores different performance regimes of the application and causes very few SLA violations. We show that there is a relatively wide range of values of the safety and stability thresholds for which the exploration policy performs well. 8. REFERENCES [1] J. Allspaw. The Art of Capacity Planning: Scaling Web Resources. O Reilly Media, Inc., [2] S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala. Automated experiment-driven management of (database) systems. In HotOS, 2009.
6 [3] M. N. Bennani and D. A. Menasce. Resource allocation for autonomic data centers using analytic performance models. In ICAC, [4] P. Bodík, G. Friedman, L. Biewald, H. Levine, G. Candea, K. Patel, G. Tolle, J. Hui, A. Fox, M. I. Jordan, and D. Patterson. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. In ICAC, [5] J. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. In Symposium on Operating Systems Principles (SOSP), [6] D. Kusic, J. O. Kephart, J. E. Hanson, N. Kandasamy, and G. Jiang. Power and performance management of virtualized computing environments via lookahead control. In ICAC 08: Proceedings of the 2008 International Conference on Autonomic Computing, pages 3 12, Washington, DC, USA, IEEE Computer Society. [7] X. Liu, J. Heo, L. Sha, and X. Zhu. Adaptive control of multi-tiered web applications using queueing predictor. Network Operations and Management Symposium, NOMS th IEEE/IFIP, pages , April [8] P. Shivam, S. Babu, and J. Chase. Active sampling for accelerated learning of performance models. In SysML, [9] P. Shivam, V. Marupadi, J. Chase, T. Subramaniam, and S. Babu. Cutting corners: Workbench automation for server benchmarking. In USENIX, [10] W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, S. Patil, A. Fox, and D. Patterson. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0, [11] C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In NSDI, [12] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press, March [13] G. Tesauro, N. Jong, R. Das, and M. Bennani. A hybrid reinforcement learning aproach to autonomic resource allocation. In International Conference on Autonomic Computing (ICAC), [14] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In ICAC, [15] P. Vosshall. Amazon, Personal communication. [16] L. Wasserman. All of Nonparametric Statistics (Springer Texts in Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
Fingerprinting the datacenter: automated classification of performance crises
Fingerprinting the datacenter: automated classification of performance crises Peter Bodík 1,3, Moises Goldszmidt 3, Armando Fox 1, Dawn Woodard 4, Hans Andersen 2 1 RAD Lab, UC Berkeley 2 Microsoft 3 Research
A Sarsa based Autonomous Stock Trading Agent
A Sarsa based Autonomous Stock Trading Agent Achal Augustine The University of Texas at Austin Department of Computer Science Austin, TX 78712 USA [email protected] Abstract This paper describes an autonomous
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of
An Autonomic Auto-scaling Controller for Cloud Based Applications
An Autonomic Auto-scaling Controller for Cloud Based Applications Jorge M. Londoño-Peláez Escuela de Ingenierías Universidad Pontificia Bolivariana Medellín, Colombia Carlos A. Florez-Samur Netsac S.A.
A Method of Cloud Resource Load Balancing Scheduling Based on Improved Adaptive Genetic Algorithm
Journal of Information & Computational Science 9: 16 (2012) 4801 4809 Available at http://www.joics.com A Method of Cloud Resource Load Balancing Scheduling Based on Improved Adaptive Genetic Algorithm
RANKING THE CLOUD SERVICES BASED ON QOS PARAMETERS
RANKING THE CLOUD SERVICES BASED ON QOS PARAMETERS M. Geetha 1, K. K. Kanagamathanmohan 2, Dr. C. Kumar Charlie Paul 3 Department of Computer Science, Anna University Chennai. A.S.L Paul s College of Engineering
Resource Provisioning for Cloud Computing
Resource Provisioning for Cloud Computing Ye Hu 1, Johnny Wong 1, Gabriel Iszlai 2 and Marin Litoiu 3 1 University of Waterloo, 2 IBM Toronto Lab, 3 York University Abstract In resource provisioning for
Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process
Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process By Claude Bouffard Director SSG-NOW Labs, Senior Analyst Deni Connor, Founding Analyst SSG-NOW February 2015 L
Advanced Load Balancing Mechanism on Mixed Batch and Transactional Workloads
Advanced Load Balancing Mechanism on Mixed Batch and Transactional Workloads G. Suganthi (Member, IEEE), K. N. Vimal Shankar, Department of Computer Science and Engineering, V.S.B. Engineering College,
Accelerate the Performance of Virtualized Databases Using PernixData FVP Software
WHITE PAPER Accelerate the Performance of Virtualized Databases Using PernixData FVP Software Increase SQL Transactions and Minimize Latency with a Flash Hypervisor 1 Virtualization saves substantial time
Enhancing the Scalability of Virtual Machines in Cloud
Enhancing the Scalability of Virtual Machines in Cloud Chippy.A #1, Ashok Kumar.P #2, Deepak.S #3, Ananthi.S #4 # Department of Computer Science and Engineering, SNS College of Technology Coimbatore, Tamil
Low Cost Quality Aware Multi-tier Application Hosting on the Amazon Cloud
Low Cost Quality Aware Multi-tier Application Hosting on the Amazon Cloud Waheed Iqbal, Matthew N. Dailey, David Carrera Punjab University College of Information Technology, University of the Punjab, Lahore,
Characterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang University of Waterloo [email protected] Joseph L. Hellerstein Google Inc. [email protected] Raouf Boutaba University of Waterloo [email protected]
Dynamic Resource management with VM layer and Resource prediction algorithms in Cloud Architecture
Dynamic Resource management with VM layer and Resource prediction algorithms in Cloud Architecture 1 Shaik Fayaz, 2 Dr.V.N.Srinivasu, 3 Tata Venkateswarlu #1 M.Tech (CSE) from P.N.C & Vijai Institute of
Probabilistic Model Checking at Runtime for the Provisioning of Cloud Resources
Probabilistic Model Checking at Runtime for the Provisioning of Cloud Resources Athanasios Naskos, Emmanouela Stachtiari, Panagiotis Katsaros, and Anastasios Gounaris Aristotle University of Thessaloniki,
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Fingerprinting the Datacenter: Automated Classification of Performance Crises
Fingerprinting the Datacenter: Automated Classification of Performance Crises Peter Bodík University of California, Berkeley Armando Fox University of California, Berkeley Moises Goldszmidt Microsoft Research
Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration
Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Guopeng Zhao 1, 2 and Zhiqi Shen 1 1 Nanyang Technological University, Singapore 639798 2 HP Labs Singapore, Singapore
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
SLA-aware Resource Scheduling for Cloud Storage
SLA-aware Resource Scheduling for Cloud Storage Zhihao Yao Computer and Information Technology Purdue University West Lafayette, Indiana 47906 Email: [email protected] Ioannis Papapanagiotou Computer and
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
Efficient and Enhanced Load Balancing Algorithms in Cloud Computing
, pp.9-14 http://dx.doi.org/10.14257/ijgdc.2015.8.2.02 Efficient and Enhanced Load Balancing Algorithms in Cloud Computing Prabhjot Kaur and Dr. Pankaj Deep Kaur M. Tech, CSE P.H.D [email protected],
SCHEDULING IN CLOUD COMPUTING
SCHEDULING IN CLOUD COMPUTING Lipsa Tripathy, Rasmi Ranjan Patra CSA,CPGS,OUAT,Bhubaneswar,Odisha Abstract Cloud computing is an emerging technology. It process huge amount of data so scheduling mechanism
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Solving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
Fingerprinting the Datacenter: Automated Classification of Performance Crises
Fingerprinting the Datacenter: Automated Classification of Performance Crises Peter Bodík EECS Department UC Berkeley Berkeley, CA, USA [email protected] Dawn B. Woodard Cornell University Ithaca,
ISE 820 All Flash Array. Performance Review. Performance Review. March 2015
ISE 820 All Flash Array Performance Review March 2015 Performance Review March 2015 Table of Contents Executive Summary... 3 SPC-1 Benchmark Re sults... 3 Virtual Desktop Benchmark Testing... 3 Synthetic
Real vs. Synthetic Web Performance Measurements, a Comparative Study
Real vs. Synthetic Web Performance Measurements, a Comparative Study By John Bartlett and Peter Sevcik December 2004 Enterprises use today s Internet to find customers, provide them information, engage
Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing
Sla Aware Load Balancing Using Join-Idle Queue for Virtual Machines in Cloud Computing Mehak Choudhary M.Tech Student [CSE], Dept. of CSE, SKIET, Kurukshetra University, Haryana, India ABSTRACT: Cloud
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications
EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications Jiang Dejun 1,2 Guillaume Pierre 1 Chi-Hung Chi 2 1 VU University Amsterdam 2 Tsinghua University Beijing Abstract. Cloud
Exploring Resource Provisioning Cost Models in Cloud Computing
Exploring Resource Provisioning Cost Models in Cloud Computing P.Aradhya #1, K.Shivaranjani *2 #1 M.Tech, CSE, SR Engineering College, Warangal, Andhra Pradesh, India # Assistant Professor, Department
Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load
Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load Pooja.B. Jewargi Prof. Jyoti.Patil Department of computer science and engineering,
Precise VM Placement Algorithm Supported by Data Analytic Service
Precise VM Placement Algorithm Supported by Data Analytic Service Dapeng Dong and John Herbert Mobile and Internet Systems Laboratory Department of Computer Science, University College Cork, Ireland {d.dong,
Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis
Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis Felipe Augusto Nunes de Oliveira - GRR20112021 João Victor Tozatti Risso - GRR20120726 Abstract. The increasing
Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing
Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load
SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V
SQL Server Consolidation Using Cisco Unified Computing System and Microsoft Hyper-V White Paper July 2011 Contents Executive Summary... 3 Introduction... 3 Audience and Scope... 4 Today s Challenges...
Service allocation in Cloud Environment: A Migration Approach
Service allocation in Cloud Environment: A Migration Approach Pardeep Vashist 1, Arti Dhounchak 2 M.Tech Pursuing, Assistant Professor R.N.C.E.T. Panipat, B.I.T. Sonepat, Sonipat, Pin no.131001 1 [email protected],
Case Study I: A Database Service
Case Study I: A Database Service Prof. Daniel A. Menascé Department of Computer Science George Mason University www.cs.gmu.edu/faculty/menasce.html 1 Copyright Notice Most of the figures in this set of
Permanent Link: http://espace.library.curtin.edu.au/r?func=dbin-jump-full&local_base=gen01-era02&object_id=154091
Citation: Alhamad, Mohammed and Dillon, Tharam S. and Wu, Chen and Chang, Elizabeth. 2010. Response time for cloud computing providers, in Kotsis, G. and Taniar, D. and Pardede, E. and Saleh, I. and Khalil,
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation
Establishing How Many VoIP Calls a Wireless LAN Can Support Without Performance Degradation ABSTRACT Ángel Cuevas Rumín Universidad Carlos III de Madrid Department of Telematic Engineering Ph.D Student
Comparison of Hybrid Flash Storage System Performance
Test Validation Comparison of Hybrid Flash Storage System Performance Author: Russ Fellows March 23, 2015 Enabling you to make the best technology decisions 2015 Evaluator Group, Inc. All rights reserved.
Comparison of Windows IaaS Environments
Comparison of Windows IaaS Environments Comparison of Amazon Web Services, Expedient, Microsoft, and Rackspace Public Clouds January 5, 215 TABLE OF CONTENTS Executive Summary 2 vcpu Performance Summary
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem INTRODUCTION As IT infrastructure has grown more complex, IT administrators and operators have struggled to retain control. Gone
Energy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
Towards an understanding of oversubscription in cloud
IBM Research Towards an understanding of oversubscription in cloud Salman A. Baset, Long Wang, Chunqiang Tang [email protected] IBM T. J. Watson Research Center Hawthorne, NY Outline Oversubscription
Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems
Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems Danilo Ardagna 1, Sara Casolari 2, Barbara Panicucci 1 1 Politecnico di Milano,, Italy 2 Universita` di Modena e
Storage I/O Control: Proportional Allocation of Shared Storage Resources
Storage I/O Control: Proportional Allocation of Shared Storage Resources Chethan Kumar Sr. Member of Technical Staff, R&D VMware, Inc. Outline The Problem Storage IO Control (SIOC) overview Technical Details
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
Data Centers and Cloud Computing. Data Centers
Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
Elastic VM for Rapid and Optimum Virtualized
Elastic VM for Rapid and Optimum Virtualized Resources Allocation Wesam Dawoud PhD. Student Hasso Plattner Institute Potsdam, Germany 5th International DMTF Academic Alliance Workshop on Systems and Virtualization
Last time. Data Center as a Computer. Today. Data Center Construction (and management)
Last time Data Center Construction (and management) Johan Tordsson Department of Computing Science 1. Common (Web) application architectures N-tier applications Load Balancers Application Servers Databases
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform
Benchmarking the Performance of XenDesktop Virtual DeskTop Infrastructure (VDI) Platform Shie-Yuan Wang Department of Computer Science National Chiao Tung University, Taiwan Email: [email protected]
DataStax Enterprise, powered by Apache Cassandra (TM)
PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies
Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
Proactive Performance Management for Enterprise Databases
Proactive Performance Management for Enterprise Databases Abstract DBAs today need to do more than react to performance issues; they must be proactive in their database management activities. Proactive
Energy Conscious Virtual Machine Migration by Job Shop Scheduling Algorithm
Energy Conscious Virtual Machine Migration by Job Shop Scheduling Algorithm Shanthipriya.M 1, S.T.Munusamy 2 ProfSrinivasan. R 3 M.Tech (IT) Student, Department of IT, PSV College of Engg & Tech, Krishnagiri,
ADAPTIVE LOAD BALANCING ALGORITHM USING MODIFIED RESOURCE ALLOCATION STRATEGIES ON INFRASTRUCTURE AS A SERVICE CLOUD SYSTEMS
ADAPTIVE LOAD BALANCING ALGORITHM USING MODIFIED RESOURCE ALLOCATION STRATEGIES ON INFRASTRUCTURE AS A SERVICE CLOUD SYSTEMS Lavanya M., Sahana V., Swathi Rekha K. and Vaithiyanathan V. School of Computing,
Energy Efficient Resource Management in Virtualized Cloud Data Centers
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Energy Efficient Resource Management in Virtualized Cloud Data Centers Anton Beloglazov* and Rajkumar Buyya Cloud Computing
Maximizing Profit in Cloud Computing System via Resource Allocation
Maximizing Profit in Cloud Computing System via Resource Allocation Hadi Goudarzi and Massoud Pedram University of Southern California, Los Angeles, CA 90089 {hgoudarz,pedram}@usc.edu Abstract With increasing
INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD
INCREASING SERVER UTILIZATION AND ACHIEVING GREEN COMPUTING IN CLOUD M.Rajeswari 1, M.Savuri Raja 2, M.Suganthy 3 1 Master of Technology, Department of Computer Science & Engineering, Dr. S.J.S Paul Memorial
Capacity planning with Microsoft System Center
Capacity planning with Microsoft System Center Mike Resseler Veeam Product Strategy Specialist, MVP, Microsoft Certified IT Professional, MCSA, MCTS, MCP Modern Data Protection Built for Virtualization
A Survey Paper: Cloud Computing and Virtual Machine Migration
577 A Survey Paper: Cloud Computing and Virtual Machine Migration 1 Yatendra Sahu, 2 Neha Agrawal 1 UIT, RGPV, Bhopal MP 462036, INDIA 2 MANIT, Bhopal MP 462051, INDIA Abstract - Cloud computing is one
Your Guide to VMware Lab Manager Replacement
Your Guide to VMware Lab Manager Replacement white paper BROUGHT TO YOU BY SKYTAP 2 Your Guide to VMware Lab Manager Replacement Contents Your Guide to VMware Lab Manager Replacement... 3 The Power and
Cloud Analytics for Capacity Planning and Instant VM Provisioning
Cloud Analytics for Capacity Planning and Instant VM Provisioning Yexi Jiang Florida International University Advisor: Dr. Tao Li Collaborator: Dr. Charles Perng, Dr. Rong Chang Presentation Outline Background
Proactive and Reactive Monitoring
Proactive and Reactive Monitoring Serg Mescheryakov, Doctor of Science, Professor Dmitry Shchemelinin, Philosophy Doctor RingCentral Inc., San Mateo, CA, USA RingCentral IP Telecommunication Company 2
Webpage: www.ijaret.org Volume 3, Issue XI, Nov. 2015 ISSN 2320-6802
An Effective VM scheduling using Hybrid Throttled algorithm for handling resource starvation in Heterogeneous Cloud Environment Er. Navdeep Kaur 1 Er. Pooja Nagpal 2 Dr.Vinay Guatum 3 1 M.Tech Student,
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Data Center Performance Insurance
Data Center Performance Insurance How NFS Caching Guarantees Rapid Response Times During Peak Workloads November 2010 2 Saving Millions By Making It Easier And Faster Every year slow data centers and application
Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel www.reliablesoftware.com development@reliablesoftware.
Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel www.reliablesoftware.com [email protected] Outsource Infrastructure? Traditional Web Application Web Site Virtual
Disk Storage Shortfall
Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks
New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN
New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN Updated: May 19, 2015 Contents Introduction... 1 Cloud Integration... 1 OpenStack Support... 1 Expanded
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort [email protected] Session Number: TBR14 Insurance has always been a data business The industry has successfully
Resource Allocation for Autonomic Data Centers using Analytic Performance Models
Resource Allocation for Autonomic Data Centers using Analytic Performance Models Mohamed N. Bennani and Daniel A. Menascé Dept. of Computer Science, MS 4A5 George Mason University 44 University Dr. Fairfax,
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
A SLA-Based Method for Big-Data Transfers with Multi-Criteria Optimization Constraints for IaaS
A SLA-Based Method for Big-Data Transfers with Multi-Criteria Optimization Constraints for IaaS Mihaela-Catalina Nita, Cristian Chilipirea Faculty of Automatic Control and Computers University Politehnica
1. Simulation of load balancing in a cloud computing environment using OMNET
Cloud Computing Cloud computing is a rapidly growing technology that allows users to share computer resources according to their need. It is expected that cloud computing will generate close to 13.8 million
Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments
Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments Applied Technology Abstract This white paper introduces EMC s latest groundbreaking technologies,
Understanding Data Locality in VMware Virtual SAN
Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing Deep Mann ME (Software Engineering) Computer Science and Engineering Department Thapar University Patiala-147004
How To Test A Web Server
Performance and Load Testing Part 1 Performance & Load Testing Basics Performance & Load Testing Basics Introduction to Performance Testing Difference between Performance, Load and Stress Testing Why Performance
Network Performance Monitoring at Small Time Scales
Network Performance Monitoring at Small Time Scales Konstantina Papagiannaki, Rene Cruz, Christophe Diot Sprint ATL Burlingame, CA [email protected] Electrical and Computer Engineering Department University
CA Automation Suite for Data Centers
PRODUCT SHEET CA Automation Suite for Data Centers agility made possible Technology has outpaced the ability to manage it manually in every large enterprise and many smaller ones. Failure to build and
Lab 6: Bifurcation diagram: stopping spiking neurons with a single pulse
Lab 6: Bifurcation diagram: stopping spiking neurons with a single pulse The qualitative behaviors of a dynamical system can change when parameters are changed. For example, a stable fixed-point can become
