Simple Power-Aware Scheduler to limit power consumption by HPC system within a budget
|
|
|
- Sharon Quinn
- 10 years ago
- Views:
Transcription
1 2014 Energy Efficient Supercomputing Workshop Simple Power-Aware Scheduler to limit power consumption by HPC system within a budget Deva Bodas, Justin Song, Murali Rajappa, Andy Hoffman Intel Corporation Deva.Bodas, Justin.J.Song, Muralidhar.Rajappa, [email protected] Abstract Future Exascale systems are projected to require tens of megawatts. While facilities must provision sufficient power to realize peak performance, limited power availability will require power capping. Current approaches for power capping limit CPU power state and are agnostic to workload characteristics. Injudicious use of such mechanisms in HPC system can impose a devastating impact on performance. We propose integrating power limiting into a job scheduler. We will describe a power-aware scheduler that monitors power consumption, distributes the power budget to each job, and implements a uniform frequency mechanism to limit power. We will compare three implementations of uniform frequency. We will show that power monitoring improves the probability of launching a job earlier, allows a job to run faster, and reduces stranded power. Our data shows that auto mode for uniform frequency operates at 40% higher frequency than a fixed frequency mode. Keywords Resource manager, scheduler, energy efficiency, power manager, power limiting, HPC, Exascale, IPMI I. INTRODUCTION Future HPC systems are expected to deliver Exascale performance with limited power and energy budgets [8]. To address these new challenges, future systems need to operate within a power budget while delivering maximum efficiency and performance. A system resource manager (SRM) that schedules hardware resources for a job will need to consider power as a resource and maximize efficiency of all resources. A power-aware scheduler will need to maintain system power within limits, maximize energy efficiency, and control the rate of change of system power consumption, while performing all the other functions a scheduler does today. The scheduler will need to forecast how much power the job will need and take corrective action when actual power differs from estimation. Corrective actions cannot be so drastic as to degrade performance or cause wild swings in power consumption. This is a very complex problem to solve [2] especially today when there is little data about power and energy characteristics of HPC workloads. We expect HPC systems to start using a simple power-aware scheduler and evolve in sophistication over the years. We developed such a power-aware scheduler for SLURM 1 that allowed us to study the cost and benefit of various features. We addressed several questions that power-aware schedulers face, and highlighted several issues that need further investigation. A demand/response interface that is being defined will determine power availability for the entire system. We will need to use this interface to reserve power for jobs that have started and cannot be suspended. The scheduler must possess the ability to manage jobs with and without power limits. A power-aware scheduler needs a way to estimate the power required to run a job. Power-performance calibration of nodes will help develop such an estimate. The estimate is expected to be based upon power-performance data collected on sample workloads or past runs of the job. Although the estimate may have a built-in guard band, actual power consumption of the job is likely to be different. Job-level power monitoring assesses differences between the estimate and actual power consumption. Such assessments create opportunities to fine-tune power allocations to each job. We will show the importance of dynamic monitoring of power consumption on a job-by-job basis. A power policy is a control mechanism used to ensure that the power consumed by a job stays within its allocation. Power monitoring influences the power policy. Lack of power monitoring requires heavy power allocation guard bands so that the job will never consume more power than the allocation. This heavy allocation will need to be equal to or greater than the maximum power for a worst case (power virus) workload. Heavy allocations have multiple issues. In many cases, fewer jobs will be able to run because the actual power consumption is likely to be far lower than the power virus allocation. Therefore, much of the power allocation will be stranded: unused and unavailable. The lack of dynamic power management also hurts performance. Without power monitoring, one will have to assume power virus levels for the operating frequency. The selected frequency will be lower than a frequency based upon more realistic power consumption. 1 Other names and brands may be claimed as the property of others. Copyright 2014 Intel Corporation Fig. 1. Impact of power monitoring on startup and operation of a job /14 $ IEEE DOI /E2SC
2 The example shown in Fig. 1 illustrates these points. The solid line represents frequency capability when power monitoring is used. The dashed line indicates frequency capability when power monitoring is not used. Zero frequency means that the job cannot run. With worst case allocation without monitoring, a job may not be able to start unless available power exceeds 1400 Watts. Power monitoring would provide a more realistic estimate, especially dynamic monitoring which would allow corrective action. In this example the job may start when available power is greater than 1100 Watts. At 1800W availability without monitoring forces 1.5 GHz while monitoring allows 2.2 GHz. Both curves employ a uniform frequency across all the nodes. The solid curve enjoys the benefit of power monitoring while the dashed curve does not. Again, the lack of power monitoring will impose over-estimation. This over-estimation may delay the start of the job until all the estimated power is available. When the job starts, the conservative estimate will force use of a lower frequency of operation. In most cases actual power consumed by a job will be significantly lower, resulting in stranded power. Fig. 2. Characteristics of various uniform frequency modes on scheduling and operation A key function of a power-aware scheduler is to maintain system power within a limit. There can be any number of schemes to limit power consumption. As a starting point, we implemented a set of schemes we call uniform frequency : all processors running the same job operate at the same frequency. Most HPC systems already operate this way. The industry has characterized and optimized many workloads running at a uniform frequency [5]. Uniform frequency power-limiting can be exercised different ways depending on how the frequency is selected and whether the same frequency is maintained throughout the job. We implemented three schemes in SLURM and IPMI daemon for executing a job at a uniform frequency: a) the user selects a frequency of operation for the duration of a job (called fixed frequency mode), b) the user specifies the minimum power level that must be allocated to a job (called "min power" mode), and c) the best frequency is automatically selected based upon available power (called "auto" mode). As Fig. 2 shows, auto mode allows a job to start with less available power. With the auto and minpower modes, a workload manager is allowed to adjust the uniform frequency based upon power headroom. The rest of this paper will be devoted to discussing the pros and cons of these three schemes. We believe power-aware scheduling to budget power for jobs will be essential, and dynamic power monitoring is necessary to minimize guard bands. Auto mode relieves users from the burden of choosing an appropriate frequency or power level. By dynamically adjusting frequency based upon power headroom, auto mode can deliver the best uniform performance and minimize stranded power. II. RELATED WORK HPC systems have demonstrated limited deployment of power-aware scheduling or methods for limiting system power. In [10] we see a collection of research issues that need to be addressed related to energy management. Simulation studies have been conducted which compare a variety of energy efficient scheduling algorithms to substantiate the benefit of including energy efficiency in scheduling [10]. In some systems, the user submitting a batch job to a work queue is provided an option to select a frequency [13], and in others, user-specified policies for performance based upon a published table of power and performance for each operational frequency are used for energy-aware scheduling [3]. These approaches do not directly address the problem of scheduling jobs within an overall system power limit. Alvarruiz et. al. [4] discusses a framework to turn off idle resources independent of a resource manager. This approach provides a coarse level of power control. It does not address dynamically determining a run-time frequency. A dynamic DVFS mechanism like Intel 2 s Enhanced Intel Speedstep Technology [8] could be employed for the jobs that have been selected to run. There have been several studies on the impact of Intel 2 s Running Average Power Limit (RAPL) on HPC workloads. For example, Rountree et. al. [12] has documented the performance degradation that arises due to power capping multiple processors while accounting for manufacturing variations. Additionally, specifying a processor power limit leaves the frequency decision in "the wrong hands". It could result in nodes running at different frequencies, incurring performance variation, and hence, degradation. A presentation by Matthieu Hautrexus describes a powercapping exploration for SLURM [10]. His main focus is to shut down nodes instead of leaving them idle to create more power headroom. III. SIMPLE POWER-AWARE SCHEDULER The future resource manager will not only manage jobs, but also maintain a power budget and manage powerconstrained energy efficiencies in real time. The key functions of a power-aware resource manager are: Run jobs while maintaining the average power consumption of the system at or slightly below the provisioned power level; Maximize performance and energy efficiency of a job by using all of the allocated power; 2 Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. 22
3 Reduce waste by operating unused resources in a sleep state; Manage power consumption ramps to a facility specification. We focused our study on the first aspect: operate jobs while maintaining average system power within a budget. In this section we will describe various terms used in this paper and introduce the essential elements of a power-aware scheduler. A. Description of terms used in this paper Throughout this paper we use terms that are expected to convey uniform understanding. 1) Provisioned power to a system (P SYS ) P SYS is the power allocation consisting of compute, IO, and OS nodes, network switches, and a storage system. The demand/response interface will eventually determine P SYS. For this paper we control P SYS manually or with a script. 2) Available power for a system A power-aware scheduler is expected to distribute P SYS among various jobs. Power available for distribution depends upon pre-allocated power and monitoring. Without monitoring: Available power = (P SYS - allocated power). With monitoring: Available power = (P SYS power consumed by the system guard band). 3) Platform Max Power (PMP) When monitoring is not used, the scheduler is forced to allocate power based on the maximum power any job could use. This maximum job power is based on the node's Platform Maximum Power (PMP). PMP is measured by running a power virus. 4) Startup power for a job Any job will need a minimum power allocation, or startup power, to start or resume from suspension. The scheduler estimates the startup power. Without monitoring, the startup power has to be PMP. With monitoring, the startup power can be based upon calibration (see section C.3)a)). When available power is less than startup power, the job cannot start. 5) Minimum required power (MRP) for a job A scheduler may not be able to suspend or kill certain jobs due to inadequate power. There are two categories of such special" jobs: jobs with no power limit and jobs that cannot be suspended. A scheduler has to reserve power for such jobs before distributing the remaining power to rest of the jobs. The amount of reserved power for each special" job is called the Minimum Required Power (MRP). For jobs with a power limit, MRP is either PMP or workload max power. For a job that cannot be suspended the MRP is the power necessary to operate the job at the lowest frequency. MRP is zero for all other jobs. Jobs that run without a power limit are never affected by a reduction in P SYS. Jobs that can be suspended may get suspended when P SYS reduces. Jobs that cannot be suspended may drop to the lowest frequency. P SYS may even drop to such a low level that the system cannot continue to run the special jobs. This could happen when the utility reduces its power allocation or a failure occurs in the power-delivery or cooling infrastructure. These can be avoided by using the demand/response interface to communicate the MRP for the system while ensuring high reliability and availability of the infrastructure. 6) Allocated power for a job The resource manager will allocate a power budget for a job. This allocation does not enforce any hardware power limit. The allocation is strictly used for two purposes: a) to determine available power for the system, and b) to take action for those cases when consumed power significantly differs from allocated power. 7) Stranded Power Ideally, a job uses the entire power budget for computation. In reality, the consumption may be less. When power is allocated to a job, it is unavailable for other jobs. The difference between allocated and actual consumption is stranded power. Stranded power is unused, unavailable, and wasted. The scheduler must minimize stranded power. B. Assumptions Our goal was to explore issues, implementation challenges, and factors that influence various scheduler decisions. For simplicity, we made two assumptions: a) P SYS is provided manually or with a script, and b) compute nodes are single tenant, i.e. all processors in a node run only one job at a time. In a production system, P SYS will need to account for: a) demand/response negotiations, b) losses incurred from inefficiencies, and c) networking and storage. Although a fully-featured production scheduler must consider energy used by all the system elements, we limit our scope to compute nodes. Although multiple jobs run simultaneously, they may have different expectations in terms of importance, priority, run-time, etc. Comprehending these restrictions increases complexity, we chose to limit the variables. We focused on managing jobs with and without a power limit. We explored the value of job-level dynamic power monitoring. We compared and analyzed the implications of static and dynamic allocations. In addition, we explored effects of jobs that needed to run to completion. We deferred all other improvements and capabilities to future analysis. C. Essential elements to operate jobs while maintaining a power budget A power-aware scheduler needs to support a few basic functions mentioned in the sections below and illustrated in Fig. 3. 1) Power scheduling policy A power-aware scheduler must be guided by an overarching policy that defines its operation. We crafted the following max resource utilization policies for our development. A system administrator could develop other policies that the administrator uses at various times. 23
4 a) Maximize utilization of all hardware and software resources. Instead of running fewer jobs at high power and leaving resources unused, we chose to run as many jobs as possible and use all the resources. b) A job with no power limit is given the highest priority among all running jobs. c) Suspended jobs are at higher priority for resumption. 2) Determining power for a system (P SYS ) In a real implementation, P SYS can be derived from the power allocation to the facility, distribution losses within the facility, voltage conversion losses outside of the server, and power needed to cool the system. For our experiments, the value of P SYS is provided as a scheduler parameter. 3) Estimation of power needed to run a job Before a scheduler can start a job, it will need to estimate how much power is needed. The estimate is governed by: a) power and performance calibration of a node, b) the ability to monitor job level power, and c) a user- selection of a power policy to limit power for a job. a) Node calibration Although an HPC system may use thousands of identical nodes, the power and performance characteristics may vary widely between nodes [12]. Variations in hardware and the environment could result in different levels of power consumption of otherwise identical nodes running the same job at the same frequency. Conversely, when hardware power limiting mechanisms force the consumption of each node to be the same, the performance of those nodes may differ. Node-level power and performance calibration enables the scheduler to generate less conservative power estimates for better decisions. Two types of characterizations were performed in our experiments. First, we ran a "power virus" and measured the PMP of each node at each operating frequency. Second, we varied the processor frequency across five representative mini-applications. For each frequency we logged into a database average power, maximum power, power deviation, and time to completion. Node calibration was performed manually. In the future, we expect an application to perform node calibration. b) Job power estimate A node calibration database is used to estimate job power. Without dynamic power monitoring, the scheduler has to presume that the job requires PMP. Monitoring provides closed-loop control and a power estimate less than PMP. With power monitoring, even an inflexible policy with limited control knobs can base an estimate on workload maximum power. Flexible controls enable dynamic job power management. Startup power becomes workload average power. The scheduler will also need to estimate the minimum required power (MRP) for jobs that cannot be suspended. The scheduler simply sums the estimates for each node to generate the job estimate. This process can be refined in the future by considering differences between sample and actual workloads. 4) User preference for job power allocation As users specify job priorities, they also need to specify power and energy policies. In this paper, we are only concerned with power policies. The choices must be selfexplanatory and should not require extensive consultation of specifications, manuals, or coding. These are the choices we considered: a) whether or not a job should be subjected to a power limit; b) whether or not the job can be suspended; and c) for a job with a power limit, the user also selects one of three modes to enforce the limit. We will discuss the modes in section IV. 5) Methods to maintain job power within a limit The user indicates whether job power can be limited. There can be any number of policies and mechanisms to do so. Today many systems run all nodes at an "identical" frequency. We evolved the "identical" or uniform" method to limit power. The three modes we developed for uniform frequency are explained in section IV. We expect significant enhancement in algorithms and methods over time as the industry gathers and analyzes data from actual runs. D. How it works Fig. 3. Overview of power-aware scheduler Based on user input, the scheduler develops an estimate of power required to start the job (startup power). This estimate is based on node calibration and whether the job is allowed to be suspended. The scheduler then checks for available power. The scheduler starts the job if available power is equal to or greater than startup power. When dynamic monitoring is available, the uniform frequency used by all nodes may be changed periodically based on power headroom. We used a policy that job that started earlier in time has higher priority in using the additional power headroom. Re-evaluations of power budgets and uniform frequency are performed periodically during runtime. Available power may drop so much that all jobs cannot continue running. In that case the scheduler picks a job at the lowest priority from a list of jobs that can be suspended. Details of this can be found in section E.2). When the allocated power increases after suspension of any job, we first try to resume a suspended job. This is consistent with our power scheduling policy described in section C.1). E. Impact of key options in a design Certain capabilities and features of HPC systems can influence scheduling sophistication and accuracy, as well as subsequent management of job power. 24
5 1) Power monitoring for a job Most HPC clusters today have limited power monitoring. Few system resource managers or workload managers have job power monitoring [3][13]. In the future we expect wide-scale deployment of power monitoring technologies to track and profile job power. The value of power monitoring will influence the rate of adoption. To understand the value, we operated power limiting methods with and without monitoring. Without monitoring, the integrity of the power and cooling system can only be ensured by allocating enough power to a job so that actual consumption will never exceed some hard limit. For such a scenario, estimated job power is based upon PMP. Such a high power allocation exposes a number of efficiency issues. Actual power consumption by nodes in most cases is expected to be lower than PMP. Since the nodes and cluster will consume less than PMP the unused power is stranded. Stranded power is a huge barrier to effectively controlling rate of change of power consumption [9]. 2) Ability to pause, suspend and resume a job The user may specify that a job must run to completion un-interrupted: it must not be suspended. Jobs that do not implement check-pointing generally fall into this category. The scheduler needs to leverage an additional technique to manage such a job. It must estimate MRP for continuous operation. Available power needs to account for MRP. An aggregate of required power in a system must be tracked and communicated via the demand/response interface so that P SYS never falls below the aggregated MRP. F. Features not considered at this time Power allocation to jobs, in the future, will be influenced by several other features which will be discussed in subsequent sections. These are areas for future enhancements. 1) Job Priority When power is scarce, the allocation will need to follow some sort of job priority. A job that was started or scheduled first is assumed to have a higher priority. Clearly this needs to be improved in the future in order to match the job priority specified by the user. 2) Controlling rate of power consumption Many facilities that host HPC systems prefer that the rate of change of power consumption stay within bounds. Facility managers prefer that all of the allocation be consumed and that stranded power be kept to a minimum. It is also desirable to limit fluctuations in power consumption within specified bounds. A scheduler and workload manager can play an important role in controlling fluctuations. 3) Improving energy efficiency of the system The scheduler should reduce power used by idle resources to improve energy efficiency and increase available power for computation. 4) Deep examination of job queues The scheduler should deeply examine the job queue to identify and schedule the best candidates to fit within P SYS. This bolsters our principle of attempting to schedule as many jobs as possible. IV. THE UNIFORM FREQUENCY POWER LIMIT When a scheduler starts a job comprised of a fixed set of compute nodes, the job may be subject to minimum and maximum power limits. A workload manager will need to ensure that the job's power consumption stays within the prescribed limits. While power limiting degrades performance, the technical challenge is to keep this degradation minimal or hidden. One can develop an optimal algorithm for managing power and performance for a specific workload, usage model, and constraints. The challenge is to develop an algorithm that applies to a wide range of workloads, usages, and constraints. Today, most HPC jobs run without any power bound. Furthermore, many workloads are tuned for best performance when all compute nodes operate at a uniform frequency. We focused on an evolutionary approach that enforces a power limit while maintaining a uniform frequency for all processors in a job. There could be a number of ways to determine and select the uniform frequency. Further, the frequency may be static throughout the job, or can be changed dynamically. Based upon these options we developed three methods for comparison. These three methods represent three control mechanisms. Control systems works best when there is feedback such as power monitoring. Precise control requires sensor accuracy and control resolution. Precise high-resolution sensors allow a reduction in guard-banding. Guard-banding used in power limiting results in stranded power and lower energy efficiency. Any scheduler or workload manager will need to manage a mix of jobs subjected to various (including "no") power limits, and may use different approaches to enforce them. In this section we will detail approaches we used for comparison. A. Jobs not subjected to a power limit The user may designate "special" jobs that are not power limited. The scheduler will need to estimate the maximum power the job could consume, and only start the job when the power is available. The workload manager may be redistributing power among "normal" jobs in order to reduce stranded power and maximize efficiency. But even if P SYS falls, the workload manager must ensure that these "special" jobs' power allocations remain intact. B. Fixed-frequency mode A user may specify the frequency. User selection may be based upon a table that indicates degradation in performance and reduction in power for each frequency. This approach is used by some systems to find the frequency that delivers the best energy efficiency. A system may use an ACPI P-state table [1] for such a purpose. Once a job starts in this mode, the frequency is never altered. The job does not incur overhead associated with a frequencyshift and is therefore scalable. However, this approach has a number of drawbacks. 25
6 1. If the user selects frequency based upon a predefined table purportedly applicable to all nodes, power variations due to manufacturing variances, environmental conditions, and built-in guard bands renders the table inaccurate. 2. The user may select a frequency based upon available power when the job is submitted. Available power at launch is likely to be very different. In a computing environment subject to shifting power availability and other variable conditions, the user lacks the information necessary to select the best frequency. 3. If the available power is lower than estimated power for the selected frequency, the job cannot be started. In some cases it may be possible to run the job at a lower frequency. When the job cannot be started and there are no other jobs in the queue, resources are underutilized. Conversely, if the available power is greater than the estimated power associated with the user selected frequency, the system may waste power as excess power cannot be used to operate the job at a higher frequency than the one selected by the user. C. min-power mode The user may specify the minimum amount of power a job must be allocated. The user may calculate this power level based upon a power-performance table and the requested number of nodes. The scheduler can start a job only if available power equals or exceeds the minimum power. Based upon available power, the scheduler selects the best frequency. When dynamic power monitoring is used, the workload manager may raise or lower the frequency while the job is running based upon increase or decrease in available power. If the available power falls below the specified minimum power threshold, the job will be suspended or terminated. The main benefit of min-power mode is that it reduces the burden on the user to guess the right frequency. Secondly, with dynamic power monitoring, the workload manager can improve performance by raising frequency opportunistically. The frequency can be altered based upon power consumed by the workload while running. However, to start a job, the scheduler will need to rely on calibration and estimation of power requirements. A key drawback of min-power mode is that the user has to calculate minimum power needed for a job using inaccurate tables. A min- Power mode job will get suspended if available power falls below the min-power level. D. Auto mode Auto mode eliminates the need for users to estimate the power or frequency to be used by their job. Uniform frequency selection is automated based upon available power. With dynamic power monitoring, a workload manager will adjust this uniform frequency periodically based upon power headroom. Auto mode allows a job to operate at all available frequencies. Since there is no userdefined minimum job power requirement, the job can start and continue as long as there is enough power to run the job at the lowest frequency. Auto mode reduces the probability of a job waiting for enough power or the job getting suspended due to a reduction in power availability. Auto mode increases resource usage and throughput. You can reduce the power limit and run more jobs to use all hardware resources. We will demonstrate that "uniform frequency" auto mode delivers the best performance and energy efficiency among the three modes we considered. Besides the fixed frequency and min-power modes there could be variations using minimum and/or maximum frequency, and minimum and/or maximum power. There could be modes which combine settings for frequency and power. All these require user calculation and experimentation. Auto mode removes this burden and delivers the best performance in a large number of scenarios. E. Implementation These policies were implemented using SLURM, an IPMI daemon, and access to processor Model Specific Registers (MSR's). We defined a period to maintain average power for a job, T AVERAGE and a control period, T CONTROL. In each control period we re-evaluate power budget, power allocation and frequency selection. We decided that T CONTROL should be 1/10 th of T AVERAGE. T AVERAGE is programmable. Assuming a facility has to maintain an average over 15 minutes, T AVERAGE for jobs ends up being 9 seconds. For min-power and Auto mode, the control system evaluates every 900 milliseconds whether the uniform frequency should be changed. The tables below describe various power levels used by the algorithms that implement these modes. TABLE I. POWER FREQUENCY SELECTION WITHOUT USE OF POWER MONITORING. Available Power Job without power limit Fixed-frequency mode User selects F S. min-power mode User select P MIN P AVAILABLE = P SYS Allocated power Startup power PMP Workload F S P MIN Pn (b) Auto Mode Frequency Maximum F S F O; Max frequency where PMP P AVAILABLE Min. required power (c) PMP F S = 0 or = P MIN = 0 or = PMP at Pn. Allocated power PMP F S Max (Min. required power, F O) Dynamic adjustments No No Yes Yes 26
7 TABLE II. POWER FREQUENCY SELECTION WITH USE OF POWER MONITORING Available Power Job without power limit Fixed Frequency User selects F S. min-power mode User select P MIN P AVAILABLE = P SYS P CONSUMED Guard Band Auto mode Workload Max + Startup power GB (a) Workload F S P MIN Workload Pn (b) Frequency Maximum F S F O; Max frequency where workload average P AVAILABLE Min. required power (c) Workload Max + = 0 or = 0 or = 0 or = P GB = Workload Max at Fs MIN = Workload max at Pn. Allocated power Workload Max + GB Workload F S Max (Min. required power, Workload F O) Dynamic adjustments No No Yes Yes Notes: a. GB: Guard band b. Pn is lowest operation frequency for processors and nodes c. For jobs that can be suspended the min. required power is zero. For others it is defined in the table V. EXPERIMENTAL SET UP A. Hardware configuration We used a cluster of 8 compute nodes shown in Fig. 4 The compute nodes use dual socket Intel 3 Xeon E (Sandy Bridge EP with a thermal design power (TDP) of 135W). The head node serves as the cluster controller hosting the SLURM resource/workload scheduler. It also serves as the front-end node that is used to start jobs. Infiniband was used to interconnect the Fig. 5. Software configuration for the experiment Head node (Login node) Linux IB Switch Data network Infiniband Compute nodes C. Workloads used for analysis We used a set of HPC benchmarks listed in TABLE III. We selected the workload from list of DOE workloads. [5]. These workloads were chosen as representative of real applications used in traditional HPC environments. TABLE III. LIST OF WORKLOADS nodes. Fig. 4. Hardware configuration for the experiment B. Software configuration We used the SLURM resource manager [13], for our experiment (see Fig. 5). we enhanced the node level daemon (slurmd) to gather power information using IPMI. Additionally, it also gets/sets processor register values related to CPU frequency via the OS MSR module. The scontrol command was modified to implement the algorithm described in section IV.E. The modified "scontrol" command logs the power and other job related information needed for our study. Workload MCB minife Qbox Lulesh Short description Monte Carlo simulation of particle transport Unstructured implicit finite element method Molecular Dynamics Unstructured shock hydrodynamics VI. RESULTS We now examine the results of the performance data collected by running the jobs through the power-aware scheduler using several scenarios as per the modes described earlier with and without power monitoring. A. Benefit of power monitoring For the purpose of this analysis, we focus on the results from the MCB workload for illustrative purposes. (Similar results for other workloads will be discussed in subsequent sections.) 3 Intel, the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. 27
8 Time to complete, sec (Lower is better) Frequency Benefit CPU Frequency, GHz (Higher is better) Fixed-Freq-NoMon MinPower-NoMon Auto-NoMon Fixed-Freq-NoMon MinPower-NoMon Auto-NoMon Fixed-Freq-WithMon MinPower-WithMon Auto-WithMon Available System Power, P SYS, W Fixed-Freq-WithMon MinPower-WithMon Auto-WithMon Available System Power,, P SYS, W W Fig. 6. Comparison of performance of MCB workload with and without power monitoring In the graph above (Fig. 6), we plotted the performance of MCB against available system power (P SYS ) with and without power monitoring 4. Note that since performance is measured as the wall clock time to complete the job, the lower the number, the better the performance. The solid lines show the performance with monitoring and the dotted lines show the performance without monitoring. Since the solid lines (with monitoring) are better (lower) than the dotted lines (without monitoring), the resource manager gets better performance with monitoring at all power limits in all modes. The benefit can be up to 40% except in fixed frequency mode where, if the job runs at all, it has to run at the fixed frequency. Hence the two lines almost overlap. Additionally, the solid lines in all three cases start closer to the Y-Axis than the corresponding dotted lines. This indicates that monitoring enables the scheduler to start jobs with lower system power limits. With this ample evidence that power monitoring deliver much higher performance for rest of the paper we will show comparisons with power monitoring alone. B. Comparison of uniform frequency modes We compare the benefits of auto mode over fixedfrequency mode for the set of workloads mentioned in TABLE III.. 4 Results have been estimated based upon internal Intel analysis and provided for information purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Fig. 7 indicates ratio of frequency in auto mode to frequency used in fixed frequency mode for the same workload at for various values of P SYS. The range of P SYS was chosen to be 50% to 100% of the unconstrained workload power. We expect that users will not choose to run jobs at power constraints more severe than 50%. auto mode results in up to 40% increase in frequency. There are no data points between 1200W and 1700W because in that range the jobs can start only in auto. Auto mode obviously outperformed fixed-frequency mode everywhere for all these workloads. Fig. 7. Performance benefit of auto mode over fixed frequency for various workloads C. Mixed mode operation We ran three jobs at the same time in different modes. Job1 was a small job that finished first and Job2 and Job3 ran longer. We compared two cases (see TABLE IV. ). P SYS = 1700W Case 1 Case 2 50% 40% 30% 20% 10% Available System Power 0% TABLE IV. MCB MIXED MODE WORKLOADS Job 1 Job 2 Job 3 Lulesh Qbox MCB 2 Nodes 2 Nodes 4 Nodes No Power Limit No Power Limit MINIFE LULESH QBOX UMT Fixed-Frequency (2.0 GHz) Fixed-Frequency (2.0 GHz) Fixed-Frequency (2.1 GHz) Auto mode The results are shown in Fig. 8. Job 3 runs in fixed frequency mode in case 1 and in auto mode in case 2. When Job 1 (Lulesh) completes at 38 seconds, power freed by Job 1 is allocated to Job 3 (MCB-auto) auto mode resulting in higher frequency of 2.9 GHz using additional power headroom. As a result Job3 (MCB-auto) finishes sooner than the same job (MCB-fixed-freq) running in fixed frequency mode as in Case 1. The corresponding chart in the middle showing the power consumption over the entire run follows a pattern similar to the CPU frequency. The bottom chart shows the total power consumption and the resultant stranded power in the two cases. It should be noted that in case 1, the system does not consume all available power, with a stranded power of up to 620W and the job ends up finishing late. In case 2, by contrast, the system power consumption is steady and is closer to the P SYS value of 1870W (with a stranded power around only 340W) causing the job to finish sooner. 28
9 TABLE V. CASES STUDIED FOR TIME-VARYING P SYS Job 1 Job 2 MCB MCB 4 Nodes 4 Nodes Case 1 No Power Limit Fixed-Frequency(2.0 GHz) Case 2 No Power Limit Auto mode The results are shown in Fig. 10 and Fig. 11. The upper sets of charts illustrate case 1 and the lower set of charts illustrates case 2. When the system power is reduced, notice in Fig. 10 that the scheduler has the flexibility to decide the power allocation and frequency of the job only in case 2 (Job 2 running in auto mode). This shows as the benefit that Job 2 in case 2 continues running (at a lower frequency) and completes sooner whereas Job 2 in case 1 is suspended and takes longer to complete. Fig. 8. CPU Frequency and power with multiple simultaneous workloads with mixed modes D. Managing time-varying P SYS As mentioned in section III.A.1), the amount of total power available to an HPC system from the utility company varies dynamically. To study the behavior of the scheduler in such scenarios, the system power available to the cluster was changed dynamically. 1) Use of increased available power We compared the time for completion of two jobs running simultaneously, one (MCB) running in fixed frequency mode at 2.0 GHz, another (MCB) running in auto mode, when subjected to time-varying P SYS that starts at 1550W and changes between 1550W and 2000 in several steps. Fig. 10. Effect of time-varying P SYS on frequency Fig. 11 shows P SYS, actual power consumed, and the stranded power. Note that in case 2, the actual power consumption is closer to the time-varying P SYS and the stranded power also stays close to zero. Also, note that the stranded power occasionally goes negative for short durations. P SYS is average maintained over relatively long duration while we are monitoring power for much smaller duration. P SYS is expected to be maintained only over a longer time period and small excursions above it are tolerated so long as the longer time-average stays below the specified system power limit. Fig. 9. Faster completion in auto mode with varying P SYS Fig. 9 shows that the frequency of the nodes running MCB in auto mode increases with additional system power. As a result MCB in auto mode finishes 20% sooner than MCB in fixed-frequency with almost same amount of consumption of energy. 2) Benefit of not suspending when P SYS decreases We ran 2 cases as shown in TABLE V..Each case has 2 jobs running simultaneously. The difference between the two cases is that Job 2 ran at a fixed frequency of 2.0 GHz in case 1 and ran in auto mode in case 2. Fig. 11. Effect of time-varying P SYS on system power 29
10 We conclude that the scheduler was able to adapt to varying P SYS in auto mode by providing improved performance compared to fixed frequency mode. Based on the above results, we have demonstrated that auto mode with power monitoring provides several superior poweraware scheduling capabilities. VII. CONCLUSION, FUTURE IMPROVEMENTS We demonstrated that job level power monitoring is beneficial to gain higher performance. Auto mode removes the burden of estimating the power consumption or frequency. Auto mode also enables a job to start at lowest available power compared to fixed frequency and min power modes. Auto mode's automatic uniformfrequency adjustment maximizes use of available power consumption. Finally, auto mode gives the fastest time to complete with optimal use of available power. The tests run on the small cluster need to be expanded to ensure that the dynamic monitoring and controls are scalable to large HPC environments. One can refine control methods by developing schemes to optimize tuning periods for monitoring and control to minimize guard bands. VIII. ACKNOWLEDGMENT We would like to thank a number of people who contributed to this paper. We specially acknowledge Jimbo Alexander who gathered the data for several experiments for this paper. We also would like to thank Corey Gough, Sunil Mahawar, David Scott, Tom Spelce, and Matt Tolentino for reviewing this paper and providing critical feedback. IX. REFERENCES [1] Alvarruiz, F., de Alfonso, C., Caballer, M. and Hernández, V An Energy Manager for High Performance Computer Clusters. ISPA 12 Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications. [2] Bhattacharya, A Constraints And Techniques For Software Power Management In Production Clusters. Technical Report No. UCB/EECS , Electrical Engineering and Computer Sciences, University of California at Berkeley. ECS pdf. [3] Brehm, M Energy Aware Scheduling SuperMUC@ LRZ. Application Support Group. Leibniz Supercomputing Centre. pdf. [4] Cai, C., Wang, L., Khan, S. and Tao, J Energy-aware High Performance Computing A Taxonomy Study. Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on. (Tainan, Taiwan. December 07, 2009). [5] Department of Energy CORAL procurement benchmarks. LLNL-PRE (May 31, 2013). [6] Etinski, M., Corbalan, J. and Labarta, J. Power- Aware Parallel Job Scheduling. Barcelona Supercomputing Center. etinski.pdf. [7] HP, Intel, Microsoft, Phoenix, Toshiba Advanced Configuration and Power Interface Specification Revision df. [8] Intel Corp Intel 64 and IA-32 Architectures Software Developer Manuals. architectures-software-developer-manuals.html. [9] Lefurgy, C., Allen-Ware, M., Carter, J., El-Essawy, W., Felter, W., Ferreira, A., Huang, W., Hylick, A., Keller, T., Rajamani, K., Rawson F. and Rubio, J Energy-Efficient Data Centers and Systems IEEE International Symposium on Workload Characterization. (Austin, Texas. November 6, 2011). [10] Mämmelä, O., Majanen, M., Basmadjian, R., De Meer, H., Giesler, A. and Homberg, W. Energyaware job scheduler for high-performance computing. Computer Science-Research and Development 27, no. 4 (2012): [11] Matthieu, H. Power capping in SLURM. Green life, (November 2013). [12] Rountree, B., Ahn, D., de Supinski, B., Lowenthal, D. and Schulz, M Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. 8th Workshop on High-Performance, Power-Aware Computing (HPPAC). (May 2012). [13] Slurm Workload Manager. (November 2013). [14] Yoo, A., Jette, M. and Grondona, M SLURM: Simple Linux utility for resource management. In, Feitelson, D., Rudolph, L. and Schwiegelshohn, U. editors. Job Scheduling Strategies for Parallel Processing. 9 th Springer Verlag International Workshop. JSSPP 2003 (Seattle June 2003). Lect. Notes Comput. Sci. vol. 2862, pages [15] Zhou, Z., Lan, Z., Tang, W. and Desai, N Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling. Department of Computer Science, Illinois Institute of Technology; Mathematics and Computer Science Division, Argonne National Laboratory. JSSPP pdf 30
Energy Constrained Resource Scheduling for Cloud Environment
Energy Constrained Resource Scheduling for Cloud Environment 1 R.Selvi, 2 S.Russia, 3 V.K.Anitha 1 2 nd Year M.E.(Software Engineering), 2 Assistant Professor Department of IT KSR Institute for Engineering
Intelligent Power Optimization for Higher Server Density Racks
Intelligent Power Optimization for Higher Server Density Racks A Baidu* Case Study with Intel Intelligent Power Technology White Paper Digital Enterprise Group Q1 2008 Intel Corporation Executive Summary
Host Power Management in VMware vsphere 5
in VMware vsphere 5 Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction.... 3 Power Management BIOS Settings.... 3 Host Power Management in ESXi 5.... 4 HPM Power Policy Options in ESXi
- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems
A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems Anton Beloglazov, Rajkumar Buyya, Young Choon Lee, and Albert Zomaya Present by Leping Wang 1/25/2012 Outline Background
The Importance of Software License Server Monitoring
The Importance of Software License Server Monitoring NetworkComputer How Shorter Running Jobs Can Help In Optimizing Your Resource Utilization White Paper Introduction Semiconductor companies typically
Energy-aware job scheduler for highperformance
Energy-aware job scheduler for highperformance computing 7.9.2011 Olli Mämmelä (VTT), Mikko Majanen (VTT), Robert Basmadjian (University of Passau), Hermann De Meer (University of Passau), André Giesler
Optimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
IT@Intel. Comparing Multi-Core Processors for Server Virtualization
White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering
DELL Virtual Desktop Infrastructure Study END-TO-END COMPUTING Dell Enterprise Solutions Engineering 1 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL
Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER
Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Table of Contents Capacity Management Overview.... 3 CapacityIQ Information Collection.... 3 CapacityIQ Performance Metrics.... 4
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing
Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing Deep Mann ME (Software Engineering) Computer Science and Engineering Department Thapar University Patiala-147004
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing January 2014 Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul
Towards Energy Efficient Query Processing in Database Management System
Towards Energy Efficient Query Processing in Database Management System Report by: Ajaz Shaik, Ervina Cergani Abstract Rising concerns about the amount of energy consumed by the data centers, several computer
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization
Technical Backgrounder Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization July 2015 Introduction In a typical chip design environment, designers use thousands of CPU
Host Power Management in VMware vsphere 5.5
in VMware vsphere 5.5 Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction...3 Power Management BIOS Settings...3 Host Power Management in ESXi 5.5... 5 Relationship between HPM and DPM...
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing
A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,
Resource Allocation Schemes for Gang Scheduling
Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian
Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers
Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers Sriram Govindan, Jeonghwan Choi, Bhuvan Urgaonkar, Anand Sivasubramaniam, Andrea Baldini Penn State, KAIST, Tata
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Dynamic Load Balancing of Virtual Machines using QEMU-KVM
Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College
Intel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 3B: System Programming Guide, Part 2 NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of eight volumes:
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Virtual Desktops Security Test Report
Virtual Desktops Security Test Report A test commissioned by Kaspersky Lab and performed by AV-TEST GmbH Date of the report: May 19 th, 214 Executive Summary AV-TEST performed a comparative review (January
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
Power efficiency and power management in HP ProLiant servers
Power efficiency and power management in HP ProLiant servers Technology brief Introduction... 2 Built-in power efficiencies in ProLiant servers... 2 Optimizing internal cooling and fan power with Sea of
Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations
Power Capping Linux Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendations Introduction of Linux Power Capping Framework 2 Power Hungry World Worldwide,
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
White Paper Perceived Performance Tuning a system for what really matters
TMurgent Technologies White Paper Perceived Performance Tuning a system for what really matters September 18, 2003 White Paper: Perceived Performance 1/7 TMurgent Technologies Introduction The purpose
This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12902
Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited
An Oracle White Paper November 2010. Oracle Real Application Clusters One Node: The Always On Single-Instance Database
An Oracle White Paper November 2010 Oracle Real Application Clusters One Node: The Always On Single-Instance Database Executive Summary... 1 Oracle Real Application Clusters One Node Overview... 1 Always
Cloud Computing Capacity Planning. Maximizing Cloud Value. Authors: Jose Vargas, Clint Sherwood. Organization: IBM Cloud Labs
Cloud Computing Capacity Planning Authors: Jose Vargas, Clint Sherwood Organization: IBM Cloud Labs Web address: ibm.com/websphere/developer/zones/hipods Date: 3 November 2010 Status: Version 1.0 Abstract:
Performance Analysis of Web based Applications on Single and Multi Core Servers
Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department
Database Instance Caging: A Simple Approach to Server Consolidation. An Oracle White Paper September 2009
An Oracle White Paper September 2009 Database Instance Caging: A Simple Approach to Server Consolidation Introduction... 1 Scenarios for Server Consolidation... 2 Scenario 1: Non-Critical Databases...
How To Manage Energy At An Energy Efficient Cost
Hans-Dieter Wehle, IBM Distinguished IT Specialist Virtualization and Green IT Energy Management in a Cloud Computing Environment Smarter Data Center Agenda Green IT Overview Energy Management Solutions
Energy Management in a Cloud Computing Environment
Hans-Dieter Wehle, IBM Distinguished IT Specialist Virtualization and Green IT Energy Management in a Cloud Computing Environment Smarter Data Center Agenda Green IT Overview Energy Management Solutions
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Tableau Server Scalability Explained
Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing
www.ijcsi.org 227 Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing Dhuha Basheer Abdullah 1, Zeena Abdulgafar Thanoon 2, 1 Computer Science Department, Mosul University,
The Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
LS DYNA Performance Benchmarks and Profiling. January 2009
LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The
Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition
Liferay Portal Performance Benchmark Study of Liferay Portal Enterprise Edition Table of Contents Executive Summary... 3 Test Scenarios... 4 Benchmark Configuration and Methodology... 5 Environment Configuration...
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can
A Dell Technical White Paper Dell Compellent
The Architectural Advantages of Dell Compellent Automated Tiered Storage A Dell Technical White Paper Dell Compellent THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL
SPC BENCHMARK 1C FULL DISCLOSURE REPORT SEAGATE TECHNOLOGY LLC IBM 600GB 10K 6GBPS SAS 2.5" G2HS HYBRID SPC-1C V1.5
SPC BENCHMARK 1C FULL DISCLOSURE REPORT SEAGATE TECHNOLOGY LLC IBM 600GB 10K 6GBPS SAS 2.5" G2HS HYBRID SPC-1C V1.5 Submitted for Review: June 14, 2013 Submission Identifier: C00016 ii First Edition June
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Power and Energy aware job scheduling techniques
Power and Energy aware job scheduling techniques Yiannis Georgiou R&D Software Architect 02-07-2015 Top500 HPC supercomputers 2 From Top500 November 2014 list IT Energy Consumption 3 http://www.greenpeace.org/international/global/international/publications/climate/2012/
Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database
WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
SUBHASRI DUTTAGUPTA et al: PERFORMANCE EXTRAPOLATION USING LOAD TESTING RESULTS
Performance Extrapolation using Load Testing Results Subhasri Duttagupta PERC, TCS Innovation Labs Tata Consultancy Services Mumbai, India. [email protected] Manoj Nambiar PERC, TCS Innovation
A Middleware Strategy to Survive Compute Peak Loads in Cloud
A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: [email protected]
Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints
Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu
IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Figure 1. The cloud scales: Amazon EC2 growth [2].
- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 [email protected], [email protected] Abstract One of the most important issues
Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs
Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs Mahdi Ghamkhari and Hamed Mohsenian-Rad Department of Electrical Engineering University of California at Riverside,
Managing Data Center Power and Cooling
White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Quantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT
81 CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT 5.1 INTRODUCTION Distributed Web servers on the Internet require high scalability and availability to provide efficient services to
Dynamic resource management for energy saving in the cloud computing environment
Dynamic resource management for energy saving in the cloud computing environment Liang-Teh Lee, Kang-Yuan Liu, and Hui-Yang Huang Department of Computer Science and Engineering, Tatung University, Taiwan
Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This document
Analysis and Modeling of MapReduce s Performance on Hadoop YARN
Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University [email protected] Dr. Thomas C. Bressoud Dept. of Mathematics and
An Oracle White Paper February 2010. Rapid Bottleneck Identification - A Better Way to do Load Testing
An Oracle White Paper February 2010 Rapid Bottleneck Identification - A Better Way to do Load Testing Introduction You re ready to launch a critical Web application. Ensuring good application performance
Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This
PEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
Performance White Paper
Sitecore Experience Platform 8.1 Performance White Paper Rev: March 11, 2016 Sitecore Experience Platform 8.1 Performance White Paper Sitecore Experience Platform 8.1 Table of contents Table of contents...
Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP
selects SAP HANA to improve the speed of business analytics with IBM and SAP Founded in 1806, is a global consumer products company which sells nearly $17 billion annually in personal care, home care,
ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009
ECLIPSE Best Practices Performance, Productivity, Efficiency March 29 ECLIPSE Performance, Productivity, Efficiency The following research was performed under the HPC Advisory Council activities HPC Advisory
Table of Contents. P a g e 2
Solution Guide Balancing Graphics Performance, User Density & Concurrency with NVIDIA GRID Virtual GPU Technology (vgpu ) for Autodesk AutoCAD Power Users V1.0 P a g e 2 Table of Contents The GRID vgpu
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
