Optimizing Shared Resource Contention in HPC Clusters
|
|
- Sherilyn McCoy
- 8 years ago
- Views:
Transcription
1 Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs when jobs are concurrently executing on the same multicore node (there is a contention for allocated CPU time, shared caches, memory bus, memory controllers, etc.) and when jobs are concurrently accessing cluster interconnects as their processes communicate data between each other. The cluster network also has to be used by the cluster scheduler in a virtualized environment to migrate job virtual machines across the nodes. We argue that contention for cluster shared resources incurs severe degradation to workload performance and stability and hence must be addressed. We also found that the state-of-theart HPC cluster schedulers are not contention-aware. The goal of this work is the design, implementation and evaluation of a scheduling algorithm that optimizes shared resource contention in a virtualized HPC cluster environment. Depending on the particular cluster and workload needs, several optimization goals can be pursued. 1 Introduction Assume the target environment of a High-Performance Computing (HPC) cluster comprised of many (hundreds or even thousands) computational nodes. The nodes in the HPC cluster are connected through a cluster network and are managed by a resource allocation and scheduling algorithm as a whole. The algorithm decides what applications to run on what nodes in the cluster and how much resources should be allocated to every process within every running job. HPC cluster is a batch processing system. It executes a job at a time chosen by the cluster scheduler according to the requirements set upon job submission, defined scheduling policy and the availability of resources. That differs from, say, an interactive system where commands are executed when entered via the terminal or a transactional system, where the jobs are executed as soon as they are initiated by a transaction request from outside the cluster. The exact methods of managing the workload by the resource allocation and scheduling algorithm depend on whether the virtualization is supported within the cluster. If there is a virtual framework on the cluster nodes, then the algorithm schedules virtual appliances (VAs) of the applications rather than applications themselves. In a non-virtualized environment, the job scheduler cannot migrate workload processes between the cluster nodes. If it deems the internode rescheduling necessary, it may only do so by killing the process and spawning it on the new desired node, or wait for the natural termination of the process and then respawn it. In a virtualized environment, a dynamic migration of VAs between the nodes of a cluster is possible. A job submitted to the HPC cluster is typically a shell script which contains a program invocation and a set of attributes allowing cluster user to manage the job after submission and to request the resources necessary for the job execution. The attributes specify the duration of the job, offer control over when a job is eligible to be run, what happens to the output when it is completed and how the user is notified when it completes. One important attribute is the resource list. The list specifies the amount and type of resources needed by the job in order to execute. The cluster job can request a number of cluster nodes, processors, the amount of physical memory, the swap or the disk space. HPC cluster scheduler puts the job in a queue upon submission. The queue contains the jobs waiting for the execution on the cluster. Once the resources specified in the job submission script are available, and if the job is eligible to run according to the cluster policy, the scheduler starts the job and executes it for the duration specified in the submission script. If the job terminates before that time, scheduler will try to use the resources freed by the job termination to run other processes. However, it might be that no jobs will be eligible to run at that time, so, in general, the cluster user will be charged for the
2 time specified in the submission script. If the job needs more time to execute than is specified in the script, the scheduler might try to allocate additional resources to the job. It might not be able to do so, as different jobs might be already scheduled for execution immediately after. If that happens, scheduler can terminate the job before its natural completion. In both cases, it is essential for HPC cluster user to correctly predict the job execution time so that the user will not be charged for the unnecessary resources if the job terminates early and so that her job will not be killed by the cluster scheduler due to its extended execution time. Although most cluster management algorithms address shared resources like CPU, disk and network interface, there are other shared resources that become increasingly important on modern multicore machines, and that were not addressed by existing cluster management proposals. In particular, there is: Shared resource contention between the applications in the memory hierarchy of each cluster node. We assume all nodes to be multicore systems. In a multicore system (Figure 1), cores share parts of the memory hierarchy, which we term memory domains, compete for resources such as last-level caches (LLC), system request queues and memory controllers [3, 6]. Figure 1: A schematic view of a cluster node with four memory domains and four cores per domain. There are 16 cores in total, and a shared L3 cache per domain. Contention and overhead of accessing cluster interconnects (cluster network). It can occur when (a) cluster uses a file server to store the data for the cluster jobs, (b) several processes of the same job spread among cluster nodes would want to commu- Figure 3: Average time increase for the 8 process MPI jobs scheduled on 2 nodes (4 processes per node) relative to a schedule on one node. nicate their data between each other (cluster jobs are usually created using MPI, a Message Passing Interface, or other APIs that would allow their processes to exchange the data between each other, even if the processes are running on different machines), (c) cluster network also has to be used by the job scheduler in a virtualized environment to migrate virtual machines across the nodes, if necessary. 2 Why taking care of the shared resource contention is important? Shared resource contention can substantially affect the performance of a cluster job. Figure 2 shows the results of the experiments where two different sets of four MPI jobs (4 processes each) were running simultaneously on a cluster comprised of 2 nodes with 8 cores each. The applications shown in this section are benchmarks from The High Energy Physics (HEP) SPEC, The NAS Parallel Benchmarks (NPB), High Performance Computing Challenge (HPCC) benchmark, Intel MPI and SPEC MPI2007 suites. We evaluated scientific applications for two reasons. First, they are CPU-intensive and often suffer from contention. Second, they are representative of the workload, typically run on HPC clusters. Among those four MPI jobs, two used memory hierarchy of the node extensively and so, when put together on the node, can experience degradation due to contention for accessing memory resources of the machine. There are three unique ways to distribute the four MPI jobs (4 process each) across the two 8 core nodes, with respect to the pairs of co-run MPI jobs sharing the node. We ran the workloads in each of these schedules, recorded the average completion time for all applications in each workload, and labeled the schedule with the lowest average completion time as the best (this is the schedule where memory intensive jobs are separated on different nodes) and the one with the highest average completion time as 2
3 Figure 2: The performance degradation for contention-unaware cluster schedule relative to a contention-aware schedule for 2 workloads comprised of scientific, MPI jobs. the worst (two memory intensive jobs are put together on the node). Figure 2 shows the performance degradation for the worst schedule relative to the best one. The best schedule delivers an 11% better average completion time than the worst one. Performance of individual applications improves by as much as 33%. This data highlights the fact that the scheduling decisions within the cluster must be contention-aware in order to prevent performance degradation due to shared resource contention. Figure 3 shows the degradation that an MPI job suffers when its processes are forced to communicate between each other using cluster interconnect. The slowdown varies greatly from job to job, but it can be as high as 778% for some MPI applications. This stresses the importance of scheduling so as to reduce the communication through cluster interconnects as much as possible. 3 Cluster schedulers are NOT contention aware The types of resources needed by the job to execute and specified upon job submission vary with the system architecture, but none of them allow to specify fine grained description of resource requirements of the job (i.e. how sensitive the application is to memory resource contention or the internode exchange of the data). Because of that, the application may encounter shortage of actual computational resources allocated to it (e.g. cache space, memory controller bandwidth or internode interconnect bandwidth), even though the resource requirements specified during the job submission (the number of nodes, cores per node, memory and so on) are perfectly met. This will in turn result in the increased execution time for the contention-sensitive job and may lead to early termination of the job by the cluster scheduler if its execution time was incorrectly predicted in the submission script. The probability of an incorrect prediction increases in large HPC clusters, as they are often used by many users and each of them in general does not know which jobs will be executed concurrently on the cluster at a given time. Scheduling decisions that take into account cluster resource contention can significantly improve the effectiveness of the HPC cluster resulting in more jobs being run and quicker job turnaround. It is the job of the scheduler to use whatever freedom is available to schedule jobs in such a manner so as to maximize cluster performance and minimize the resources spent on it. 4 Our proposal: make the cluster schedulers contention-aware The goal of this work is the design, implementation and evaluation of a scheduling algorithm that optimizes shared resource contention in an HPC cluster. Depending on the particular cluster and workload needs, the following optimization goals can be pursued by the cluster scheduler: Stable performance of the overall system, fairness in shared resource contention degradation for all applications. Performance boost for the chosen (prioritized) jobs due to reduction in resource contention for them or complete isolation from the resource contention. Reduction in power consumption on the system by packing of applications on as few nodes as possible, thus providing a better solution in terms of power-performance trade-off. We intent to measure the improvement in terms of Energy Delay Product (EDP) for the cluster with contention-aware schedulers in comparison with the default scheduler/default scheduler with power savings on. Energy Delay Product is a common metric for energy/performance improvement [4]. Scalability. It is expected that the number of cluster nodes as well as the number of processor cores 3
4 within a single cluster node will continue to increase [2]. Any scheduling and resource allocation algorithms in such an environment should be highly scalable, because a centralized solution would result in delayed scheduling decisions and inability to respond to dynamic workloads. The efficiency of the scheduler is measured in the time it takes to make a complete scheduling decision for 10, 100, 1000, etc. jobs/processes. In centralized algorithm it will increase exponentially with the number of nodes and cores (number of potential scheduling entities), while decentralized approach will reduce the time by breaking the scheduling task into several subtasks which will be executed in parallel. We assume that each goal should be achieved under the following requirements: Maximizing overall workload performance (as long as it does not contradict the goal objective) Satisfying user resource constraints. User requirements are currently expressed via the number of desired dedicated nodes, cores or allocated memory. As we optimize the shared resource contention in cluster, we must make sure that we do not give the job fewer nodes, CPUs or less memory than the job requested, unless the job effectively uses fewer resources than it had requested. Ensuring that workload of each user is hurt due to contention only within certain predefined limits. 5 Design challenges In order to fulfil the optimization goals outlined above, we need to come up with the solutions to the following problems: 1) In a cluster environment, the scheduler generally runs the job if it is the next in queue and all the resources requested to it are available to assign to the job s processes. 1 This approach, however, assumes that the user knows what resources are necessary for the job to complete in the required amount of time (which must also be specified by the user). The existing schedulers allow users to post coarse-grained resource demands, like the number of execution cores, the maximum amount of main memory, disk or swap space that the job will use upon submission. All of this information, however, does 1 There could be of course exceptions from this general rule if, for instance, certain jobs are deemed high priority in which case they can prevent non-prioritized jobs to start before them. Another example would be a backfill scheduling policy: if the scheduler sees that the next job in the queue cannot start due to the lack of necessary resources, it can instead start the jobs that are located later in the queue to prevent resource wasting. not reflect how sensitive the job is to the resource contention from different jobs that will be simultaneously executing in the same cluster with the submitted one. As a result, neither users, nor the cluster scheduler are able to predict the actual execution time that the job will have within the particular cluster setup and workload. This can lead either to the overestimated execution times which results in increased charges for users, or underestimated times which result in early termination of the cluster jobs by the scheduler. To address this problem, a new set of contention descriptive metrics representing a finegrained information about each job s resource utilization and communication patterns needs to be provided both to the scheduler to help it make a scheduling decision and to the cluster users to properly describe the jobs they submit and to estimate the slowdown due to cluster sharing. Some of these metrics can be found in the previous work (Section 6), while others needs to be discovered. 2) The optimization goals outlined above can be potentially fulfilled together. For example, the scheduling task specified by the system administrator for the whole cluster (or, possibly, by the user for her submitted tasks only) could be boost the execution of the given subset of jobs while saving power for the rest as much as possible. How should we devise the algorithm so it could fulfil several optimization goals at the same time? Another interesting investigation would be the ability of the scheduler to dynamically detect, which optimization goal is the most beneficial for the current cluster workload and then dynamically switch between optimization goals as necessary. 3) To better optimize the cluster contention, the scheduler would need to co-schedule the jobs that do not compete for the shared resources. Hence, there is a tension to look ahead into the queue of submitted jobs: there could be, for instance, something at the tail of the queue that will result in better contention properties, but at the expense of skipping the queue order. How should we tradeoff the goals of fairness and contention management in this case? 4) When we have a queue of jobs as well as many jobs that are currently running on the cluster, what is the algorithm for creating assignments that answer the particular optimization goal(s) the scheduler is trying to accomplish: CPU and memory requirements, contention, power consumption, etc.? The combinations of jobs that we can create are many how do we find one quickly? 5) In the model we are proposing, there is an incentive to give the user less resources than they asked for if they do not effectively use them (for instance, if the user submitted the CPU-intensive job while requesting the whole dedicated node to it, the scheduler can still assign more jobs to the same node in case the submitted job effectively uses only one core on a multicore machine). This 4
5 could increase the resource utilization, but can cause the conflicts between colocated jobs and, as a result, slowdown due to shared resource contention. What incentives should we give to users to accept this kind of frivolity on the part of cluster scheduler? 6 What has been done so far? How can it help? In our previous work, we investigated ways of reducing resource contention within a mulicore machine (a cluster node) [3, 6]. Our methodology allowed us to identify the last-level cache miss rate as one of the most accurate predictors of the degree to which applications will suffer when co-scheduled. We used it to design and implement an OS scheduling algorithm called Distributed Intensity (DI). We showed experimentally that DI performs better than the default Linux scheduler, delivers much more stable execution times, and performs within a few percentage points of the theoretical optimal. DI separates memory intensive applications as far in the memory hierarchy of the machine as possible [3]. On many multicore systems, power consumption can be reduced if the workload is concentrated on a handful of chips, so that remaining chips can be brought into a low-power state. In order to determine whether threads should be clustered (to save power) or spread across chips (to avoid excessive contention) the scheduler must be able to predict to what extent threads will hurt each other s performance if clustered. We found that DI, with a slight modification, is able to make this decision very effectively which led to EDP improvement by as much as 80% relative to plain DI [3]. Koukis and Koziris [5] present the design and implementation of a gang-like scheduling algorithm aimed at improving the throughput of multiprogrammed workloads on multicore systems. The algorithm selects the processes to be co-scheduled so as not to saturate nor underutilize the memory bus or network link bandwidth. Its input data are acquired dynamically using hardware monitoring counters and a modified NIC firmware. The experimental setup in [5] assumed that all processes were spawned directly under the control of the Linux scheduler, using mpirun command. The authors then compared its performance with the default Linux scheduler (O(1) at the time [5] was written). While using an OS scheduler in an HPC cluster setup can be justified for a very small number of nodes, the industry-size clusters require state-of-the-art cluster schedulers (the cluster scheduler we experiment with is Maui [1]), to make scheduling decisions, since these schedulers support features like scalability, fulfilling user specified constraints, dynamic priorities, reservations, and fairshare capabilities, necessary for a big cluster operation and absent in the OS schedulers. Because of that, within our work, we mainly target at comparing the performance of our techniques with the state-of-the-art cluster schedulers on industry-scale clusters. 7 Summary In this paper, we experimentally showed that the contention for cluster shared resources between jobs within multicore nodes of an HPC cluster and the jobs accessing cluster interconnects can incur severe performance degradation to their execution time. This in turn could lead to the premature termination of a job by cluster scheduler, if job execution time was incorrectly specified in the job submission script. We have described how this motivates our project on the design and implementation of the contention aware cluster scheduler that can optimize HPC cluster contention in several ways: (1) fairness in the degradation caused by shared resource contention for all cluster jobs, (2) performance boost for the chosen (prioritized) jobs, (3) reduction in power consumption on the system by packing of cluster jobs on as few nodes as possible, (4) scalability of the contention-aware cluster algorithm for HPC clusters with large number of nodes/- cores per node. To fulfill these scheduling objectives, a new set of metrics needs to be found that models shared resource contention and represents a fine-grained information about each job s resource utilization and communication patterns. The last-level cache miss rate and the amount of traffic through network interface on the cluster node proposed in earlier work are examples of such metrics. The necessary information can be obtained with the performance counters within cluster nodes and extensive cluster interconnect monitoring between them. References [1] Maui scheduler administrator s guide. [Online] Available: [2] Teraflops research chip. [Online] Available: Research Chip. [3] BLAGODUROV, S., ZHURAVLEV, S., AND FEDOROVA, A. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. 28 (December 2010), 8:1 8:45. [4] GONZALEZ, R., AND HOROWITZ, M. Energy dissipation in general purpose microprocessors, [5] KOUKIS, E., AND KOZIRIS, N. Memory and network bandwidth aware scheduling of multiprogrammed workloads on clusters of smps. In Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1 (2006), ICPADS 06, pp [6] ZHURAVLEV, S., BLAGODUROV, S., AND FEDOROVA, A. Addressing Contention on Multicore Processors via Scheduling. In ASPLOS (2010). 5
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
More informationAddressing Shared Resource Contention in Multicore Processors via Scheduling
Addressing Shared Resource Contention in Multicore Processors via Scheduling Sergey Zhuravlev Sergey Blagodurov Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationA High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
More informationFACT: a Framework for Adaptive Contention-aware Thread migrations
FACT: a Framework for Adaptive Contention-aware Thread migrations Kishore Kumar Pusukuri Department of Computer Science and Engineering University of California, Riverside, CA 92507. kishore@cs.ucr.edu
More informationMicrosoft HPC. V 1.0 José M. Cámara (checam@ubu.es)
Microsoft HPC V 1.0 José M. Cámara (checam@ubu.es) Introduction Microsoft High Performance Computing Package addresses computing power from a rather different approach. It is mainly focused on commodity
More informationNetwork Infrastructure Services CS848 Project
Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud
More informationA Practical Method for Estimating Performance Degradation on Multicore Processors, and its Application to HPC Workloads.
A Practical Method for Estimating Performance Degradation on Multicore Processors, and its Application to HPC Workloads Tyler Dwyer, Alexandra Fedorova, Sergey Blagodurov, Mark Roth, Fabien Gaud, Jian
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationMEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM?
MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied
More informationThe Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa Introduction Problem Recent studies into the effects of memory
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationPerformance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
More informationDelivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
More informationOracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
More informationResource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationSolving I/O Bottlenecks to Enable Superior Cloud Efficiency
WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one
More informationMulti-core and Linux* Kernel
Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores
More informationAn objective comparison test of workload management systems
An objective comparison test of workload management systems Igor Sfiligoi 1 and Burt Holzman 1 1 Fermi National Accelerator Laboratory, Batavia, IL 60510, USA E-mail: sfiligoi@fnal.gov Abstract. The Grid
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationTableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationOperating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationLSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
More informationChapter 1: Introduction. What is an Operating System?
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments
More informationGC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems
GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich
More informationIntel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationBridgeWays Management Pack for VMware ESX
Bridgeways White Paper: Management Pack for VMware ESX BridgeWays Management Pack for VMware ESX Ensuring smooth virtual operations while maximizing your ROI. Published: July 2009 For the latest information,
More informationReady Time Observations
VMWARE PERFORMANCE STUDY VMware ESX Server 3 Ready Time Observations VMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified
More informationThe Importance of Software License Server Monitoring
The Importance of Software License Server Monitoring NetworkComputer How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization White Paper Introduction Semiconductor companies typically
More informationHow To Improve Performance On A Multicore Processor With An Asymmetric Hypervisor
AASH: An Asymmetry-Aware Scheduler for Hypervisors Vahid Kazempour Ali Kamali Alexandra Fedorova Simon Fraser University, Vancouver, Canada {vahid kazempour, ali kamali, fedorova}@sfu.ca Abstract Asymmetric
More informationPerformance Analysis of Thread Mappings with a Holistic View of the Hardware Resources
Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Wei Wang, Tanima Dey, Jason Mars, Lingjia Tang, Jack Davidson, Mary Lou Soffa Department of Computer Science University
More informationApplication Performance Testing Basics
Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free
More informationSAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load
More informationTechnical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment
Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,
More informationDirections for VMware Ready Testing for Application Software
Directions for VMware Ready Testing for Application Software Introduction To be awarded the VMware ready logo for your product requires a modest amount of engineering work, assuming that the pre-requisites
More informationMesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
More informationOperating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:
Chapter 7 OBJECTIVES Operating Systems Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the concept of virtual memory. Understand the
More informationWindows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described
More informationAdaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization
Technical Backgrounder Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization July 2015 Introduction In a typical chip design environment, designers use thousands of CPU
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationWindows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
More informationWHITE PAPER Guide to 50% Faster VMs No Hardware Required
WHITE PAPER Guide to 50% Faster VMs No Hardware Required Think Faster. Visit us at Condusiv.com GUIDE TO 50% FASTER VMS NO HARDWARE REQUIRED 2 Executive Summary As much as everyone has bought into the
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationChapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationHyperThreading Support in VMware ESX Server 2.1
HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect
More information2. Research and Development on the Autonomic Operation. Control Infrastructure Technologies in the Cloud Computing Environment
R&D supporting future cloud computing infrastructure technologies Research and Development on Autonomic Operation Control Infrastructure Technologies in the Cloud Computing Environment DEMPO Hiroshi, KAMI
More informationMultilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
More informationCS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
More informationIBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
More informationAbstract: Motivation: Description of proposal:
Efficient power utilization of a cluster using scheduler queues Kalyana Chadalvada, Shivaraj Nidoni, Toby Sebastian HPCC, Global Solutions Engineering Bangalore Development Centre, DELL Inc. {kalyana_chadalavada;shivaraj_nidoni;toby_sebastian}@dell.com
More informationPerformance Test Results Report for the Sled player
Performance Test Results Report for the Sled player The Open University Created: 17 th April 2007 Author Simon Hutchinson The Open University Page 1 of 21 Cross References None
More informationScheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:
Scheduling Scheduling Scheduling levels Long-term scheduling. Selects which jobs shall be allowed to enter the system. Only used in batch systems. Medium-term scheduling. Performs swapin-swapout operations
More informationDynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources
Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources JeongseobAhn,Changdae Kim, JaeungHan,Young-ri Choi,and JaehyukHuh KAIST UNIST {jeongseob, cdkim, juhan, and jhuh}@calab.kaist.ac.kr
More informationPerformance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers
Performance Isolation of a Misbehaving Virtual Machine with Xen, VMware and Solaris Containers Todd Deshane, Demetrios Dimatos, Gary Hamilton, Madhujith Hapuarachchi, Wenjin Hu, Michael McCabe, Jeanna
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationPERFORMANCE TUNING ORACLE RAC ON LINUX
PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database
More informationThe Advantages of a Multi-Tenant workload System
Prepared by: George Crump, Senior Analyst Prepared on: 7/30/2009 http://www.storage-switzerland.com Copyright 2009 Storage Switzerland, Inc. - All rights reserved There is a dark cloud looming in storage.
More informationExploring RAID Configurations
Exploring RAID Configurations J. Ryan Fishel Florida State University August 6, 2008 Abstract To address the limits of today s slow mechanical disks, we explored a number of data layouts to improve RAID
More informationVirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
More informationTPCalc : a throughput calculator for computer architecture studies
TPCalc : a throughput calculator for computer architecture studies Pierre Michaud Stijn Eyerman Wouter Rogiest IRISA/INRIA Ghent University Ghent University pierre.michaud@inria.fr Stijn.Eyerman@elis.UGent.be
More informationLoad DynamiX Storage Performance Validation: Fundamental to your Change Management Process
Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process By Claude Bouffard Director SSG-NOW Labs, Senior Analyst Deni Connor, Founding Analyst SSG-NOW February 2015 L
More informationEnergy Aware Consolidation for Cloud Computing
Abstract Energy Aware Consolidation for Cloud Computing Shekhar Srikantaiah Pennsylvania State University Consolidation of applications in cloud computing environments presents a significant opportunity
More informationResource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
More informationResource usage monitoring for KVM based virtual machines
2012 18th International Conference on Adavanced Computing and Communications (ADCOM) Resource usage monitoring for KVM based virtual machines Ankit Anand, Mohit Dhingra, J. Lakshmi, S. K. Nandy CAD Lab,
More informationEnergy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationCapacity Estimation for Linux Workloads
Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines
More informationNew Issues and New Capabilities in HPC Scheduling with the Maui Scheduler
New Issues and New Capabilities in HPC Scheduling with the Maui Scheduler I.Introduction David B Jackson Center for High Performance Computing, University of Utah Much has changed in a few short years.
More informationSAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
More informationWhite Paper Perceived Performance Tuning a system for what really matters
TMurgent Technologies White Paper Perceived Performance Tuning a system for what really matters September 18, 2003 White Paper: Perceived Performance 1/7 TMurgent Technologies Introduction The purpose
More informationProgram Grid and HPC5+ workshop
Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid
More informationChapter 5 Linux Load Balancing Mechanisms
Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still
More informationIntel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
More informationThe Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage
The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationSystem Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
More informationAgile Performance Testing
Agile Performance Testing Cesario Ramos Independent Consultant AgiliX Agile Development Consulting Overview Why Agile performance testing? Nature of performance testing Agile performance testing Why Agile
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationMicrosoft SQL Server OLTP Best Practice
Microsoft SQL Server OLTP Best Practice The document Introduction to Transactional (OLTP) Load Testing for all Databases provides a general overview on the HammerDB OLTP workload and the document Microsoft
More informationVirtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:
More informationImproved Hybrid Dynamic Load Balancing Algorithm for Distributed Environment
International Journal of Scientific and Research Publications, Volume 3, Issue 3, March 2013 1 Improved Hybrid Dynamic Load Balancing Algorithm for Distributed Environment UrjashreePatil*, RajashreeShedge**
More informationIntroduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7
Introduction 1 Performance on Hosted Server 1 Figure 1: Real World Performance 1 Benchmarks 2 System configuration used for benchmarks 2 Figure 2a: New tickets per minute on E5440 processors 3 Figure 2b:
More informationDell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820
Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820 This white paper discusses the SQL server workload consolidation capabilities of Dell PowerEdge R820 using Virtualization.
More informationChapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
More informationUtilization Driven Power-Aware Parallel Job Scheduling
Utilization Driven Power-Aware Parallel Job Scheduling Maja Etinski Julita Corbalan Jesus Labarta Mateo Valero {maja.etinski,julita.corbalan,jesus.labarta,mateo.valero}@bsc.es Motivation Performance increase
More informationGrid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
More informationInternational Journal of Computer & Organization Trends Volume20 Number1 May 2015
Performance Analysis of Various Guest Operating Systems on Ubuntu 14.04 Prof. (Dr.) Viabhakar Pathak 1, Pramod Kumar Ram 2 1 Computer Science and Engineering, Arya College of Engineering, Jaipur, India.
More informationPerformance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
More informationA Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
More informationDeploying and Optimizing SQL Server for Virtual Machines
Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationHigh Availability Essentials
High Availability Essentials Introduction Ascent Capture s High Availability Support feature consists of a number of independent components that, when deployed in a highly available computer system, result
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More information