National Facility Job Management System
|
|
- Ralph Daniels
- 8 years ago
- Views:
Transcription
1 National Facility Job Management System 1. Summary This document describes the job management system used by the NCI National Facility (NF) on their current systems. The system is based on a modified version of OpenPBS in the following ANUPBS will refer to this version. Depending on the platform, ANUPBS may have to interact with, and use features of, a native job management system. This interaction with other resource management systems and the necessary features of that system are also described. The critical feature of the job management system is the use job suspension/resumption to schedule the bulk of the work presented to the system (not only for handling high priority work). In particular a large fraction of the parallel jobs on the system have suspended smaller, longer jobs to run. The use of suspend/resume allows very high utilization (>95%) to be maintained even with an extremely diverse workload mix and while still respecting any political share allocations. This high utilization is achieved without bias towards any class of jobs such as those that can fill the scheduling holes created in systems based on backfill and reservations. Equally importantly, care is taken to ensure suspend/resume scheduling does not compromise the performance of any jobs. The use of suspend-resume has a number of implications that will be discussed in the following sections: since jobs will share nodes, there must be careful job process management, resource monitoring and limiting to ensure jobs dont impact each other, there are additional requirements in the areas of NUMA- and network topology-awareness, lightweight scalable operation etc to ensure that jobs are always given the opportunity to perform optimally, job paging/swapping must be carefully managed and, to meet policy and fairness scheduling goals as well as high utilization, the usual divide between scheduler and resource manager must be eroded or removed. 2. Background The National Facility provides high performance computing services to all Australian academics and government research agencies requiring large high performance compute resources. This broad charter has a number of practical implications in terms of the workload mix and how it can be serviced: the term large compute resource requirements is not limited to number of cpus per job it includes number of small jobs, amount of single node memory or disk, cost of licensed software etc. Jobs currently range from 600hr single processor Gaussian jobs requiring 800GB of node local scratch disk to tightly coupled but highly scalable 8000-cpu combustion simulations. A large fraction of the resources are consumed by climate simulations utilizing between 128 and 512 cpus. in terms of total cpu-hours, less than 10% is consumed by single node jobs (although they do constitute a large number of jobs) and usually more than 50% of the system is running jobs of greater than 64 cpus. The trend to larger parallel jobs at the National Facility has been relentless over the last 10 years and is expected to continue. the most difficult jobs to schedule are typified by VASP jobs using around cpus or more but requiring a runtime of 100 hours or more because of checkpointing difficulties. These jobs fragment cpu space for long periods of time making capability job starts difficult. experience has shown that partitioning the system along the lines of job types or resource requirements invariably leads to idle partitions at the same time as there are jobs queued for other partitions the number of users (and projects) are in the several hundreds with the user skill levels varying considerably the frequency of jobs trying to run amok (by trying to use greater resources than requested) can be quite high to optimize support for the varied workload, NCI NF systems are heterogeneous the amount of memory, swap and local disk and possibly even number of cpus varies across nodes which has implications for system scheduling the various NCI Partner organizations and access schemes have pre-determined shares of the system the scheduler must deliver those shares and deliver to priority projects regardless of the 1
2 characteristics of their jobs requests for allocations within each share heavily over-subscribe those shares there is an expectation that the system deliver close to 100% of available cpu-hours Over more than 15 years, NCI-NF staff have developed a management system that comes very close to overcoming these difficulties and meeting all these goals. Motivated by frustrations in managing a closed and inflexible vector parallel system, development on ANUPBS, began in 1997, prior to the existence of PBSPro and Torque. The system has been ported to a large variety of HPC systems: initial development occurred on a cluster of large (24 and 64 cpus) Solaris SMP nodes maturation on, and integration with the Quadrics components of, the Compaq/HP AlphaServer SC easy deployment on a number of Linux clusters at the National Facility and around Australia sophisticated NUMA-awareness development and integration with SGI Array Services on a cluster of 64-way SGI Altix systems further scalability enhancements and network topology awareness on core Sun/Oracle Constellation cluster 3. Resource Allocation The traditional, simple model of job management has involved a scheduler to decide which job to run next and an independent resource manager to allocate that job cpus. As discussed in 6, for sophisticated suspend/resume based scheduling, these two roles cannot be disentangled. Here we note that, independent of suspend-resume, complete job management also necessitates combining these roles. With a sufficiently diverse job mix and user base, all resources need to be managed carefully at the scheduling level. On NCI-NF systems, a resource currently means one of: cpus or nodes: users request number of cpus although for distributed jobs, this number must match whole nodes. There is at most one running job on each cpu at anytime scheduling is never based on load. memory: users request virtual memory but are allocated physical memory, i.e the total address space of all processes must fit in the free physical memory. The expectation is that HPC jobs are using most of their address space. ANUPBS has an option to (more expensively) evaluate job physical page use (including swap) and limit on that measure. swap space: this is not a resource users request but is one monitored and managed by ANUPBS local disks: see the discussion of jobfs in section 8. software licenses : a system-wide resource not monitored at the node level but still strictly monitored and managed by ANUPBS. local IO bandwidth: crudely allocated users nominate if their job is IO bound and the scheduler only allocates one such job per node. Not currently monitored. walltime: not a physically limited resource and hence not a hard a priori constraint on scheduling. However, walltime is very significant in deciding if and when suspension occurs see section 7. gpus: basic allocation functionality hindered by the lack of access control on GPUs User job requests must specify all the resources required default resource requests are deliberately limiting. The requests are also required to be reasonably accurate so that appropriate physical resources can be reserved for the job's real needs without undue waste. By necessity, the ANUPBS scheduler has the responsibility of allocating all these resources. (Of course it does not physically allocate any resources the allocation is theoretical in the sense that it assumes jobs never exceed their resource requests. See the next section for the physical allocation.). The 2
3 scheduler is aware of all the resources available on all nodes, both what is unallocated to jobs and what is actually unused by jobs, and constrains scheduling decisions in light of this availability and the resource requests of candidate queued jobs. This strict allocation process is essential because of: 1. the reasonably large number of subnode-size jobs sharing nodes and hence node resources while running, 2. the heterogeneity of the nodes of the system (the number of cpus and amounts of available memory, disk and swap space vary amongst the nodes) and 3. job suspension/resumption (see section 6) causing additional node resource sharing At a minimum, sufficient swap space and node local disk must be available to support all suspended and running jobs co-resident on each node. This scheduling constraint is imposed based on an appropriate mixture requested and actually used job resources. For example, if all jobs on a node are to be suspended to run on a single job requiring all the cpus of the node then the sum of the current jobs' actual usage and the new job's requested usage must be within the nodes capacity. Whenever there is a possibility of jobs running simultaneously on a node (each using a subset of cpus) in the future, resource requests (as opposed to current usage) must be used to determine total usage. 4. ANUPBS on the Compute Nodes A PBS job execution/management and node monitoring daemon called a MOM is run on every compute node. This daemon: initiates and cleans up all jobs on that node monitors jobs resource usage to enforce scheduling decisions and monitors node resources and activity in detail. Job initiation/completion: Under PBS, executing jobs exist as a shell either running the batch script or with tty connection to a user terminal session. PBS has no knowledge of the commands in the batch script and jobs are never simply a command (unlike under LSF). All jobs on the system are initiated by a MOM on a compute node allocated to the job with appropriate environment and limits set. When requested by the job, the MOM also initiates a directory on a node local filesystem see section 8. By default, job stdout and stderr is managed by the MOM on the node and returned to a global filesystem on job completion. At job completion, the MOM also stages files out of and then cleans up jobfs directories and removes leftover shared memory segments and /tmp files. Starting jobs on their allocated nodes means a node failure effects only those jobs actually running on that node. It also means that users not running MPI jobs do not need to use remote execution utilities in their scripts and their scripts have direct access to jobfs (see below). Job monitoring: While the scheduler does virtual allocation of resources, the MOM is responsible for ensuring those allocations are actually available by limiting all jobs to their requested resources. To keep the overhead of monitoring low, users processes are sampled approximately once per minute but more often during the initial phase of the job when resource usage is likely to be changing most rapidly. Total per-node job resource usage is determined (taking in to account threads, shared memory segments etc) and if it exceeds the request, the job is terminated. Swap space on every node allows the node to absorb overuse of memory until the next job monitoring cycle of the MOM. Hence, given the scheduling constraints, jobs sharing nodes are guaranteed not to catastrophically impact one another. Node monitoring: The MOM provides detailed node status information to PBS such as physically available and unused resources (like memory and disk) as well as actual cpu usage and paging rates. In addition to providing basic scheduling information for the scheduler, monitors can send alerts of exceptional states like unexpected memory or disk usage or excessive load or paging. 5. Suspension/resumption The primary mechanism used to run large parallel jobs is the suspension of smaller jobs. Since suspending a job is effectively just sending a SIGSTOP to the job's processes, suspended jobs remain resident on their execution nodes and these nodes are temporarily reallocated to the larger parallel job. On completion of the suspending job, the processes or the suspended are simply sent a SIGCONT. 3
4 The obvious concern about suspension/resumption is the possibility of excessive paging when a suspended job is replaced in memory by a job just starting. In reality, this is rarely a serious issue. Many suspensions never lead to any paging because the combined memory use of all jobs concerned is sufficiently small. Even when there is memory overcommitment, it can be constrained to an acceptable level by scheduling decisions (and MOM job limit enforcement). The other mitigating factor saviour is the relatively high performance of paging under Linux (which is likely to improve further with large page swapping). In the worst cases, only a couple of minutes of paging This lazy approach to claiming memory for an incoming job is much more efficient than the alternative of preemptively swapping suspended jobs before starting suspending jobs. Given the frequency of failures of jobs startups (due to user error) or over request of memory, preemptive swapping would induce an unnecessarily large amount of swapping. The astute reader will be aware of the importance of NUMA page placement on application performance and detect a possible issue with suspend/resume in this context. Ideal MPI performance is achieved when an MPI tasks are confined to a single core and their page allocations are all to the local NUMA memory of that core. In some circumstances, suspend-resume may lead to more off-node page allocations because particular NUMA nodes are full of suspended jobs. On the last two NF systems, the primary MPI library has been customized to provide memory binding by default for MPI jobs so that page allocations never go off node. In reality, suspend/resume is a secondary reason for introducing memory binding. Even without suspend/resume, memory binding has been shown to greatly improved the consistency of performance of large-scale MPI applications. 6. Scheduling Suspend/resume is integral to supporting capability and capacity usage as well as maintaining high overall system efficiency and utilization. Unlike a number of other systems, it is not simply a brute force approach to running high priority, parallel jobs. It is an essential component of virtually every job start decision including selecting between nominally equal priority jobs. In essence, the mechanism is a form of time-slicing or gang-scheduling at the job length timescale. In terms of improving overall system utilization, it is conceptually the equivalent of cheating at tetris by chopping the blocks (the cpu-walltime size of jobs) up to make them easier to pack. As discussed in section 3, the decision process first involves satisfying all physical resource constraints to ensure no overcommitment. Hard restrictions (like ensuring co-located jobs will never exhaust node swap space or local disk space) are supplemented with heuristics to avoid excessive paging when a new job starts on the same node as suspended jobs. Since ANUPBS is starting multiple jobs in a scheduling cycling, it is important that ANUPBS knows exactly which nodes each job runs on. Relying on some indirect control like setting job priorities and leaving the node selection decision up to an external resource management system, can lead to undesirable job placement because the system state seen by the second scheduler may be different to that seen by ANUPBS (jobs are constantly completing). Hence the requirement of allowing ANUPBS to fully specify the nodes allocated to a job. The real complexity of the scheduling process comes in trying to achieve some form of fairness (or adhering to share policy goals) in choosing which jobs, and when, to suspend. The NCI scheduler tries to ensure no jobs are starved (suspended or queued indefinitely) and to give roughly equitable access and turnaround to all users/projects. It is important to note that scheduling decisions (particularly suspension) are not based on static job priorities virtually all jobs are in the one queue with the same static priorities. The decisions are based on a number of dynamic factors including relative job walltime and ncpus requests, how long jobs have already been suspended, the relative number of cpus already in use by the respective users and projects, how close to completed the prospective suspendee jobs are, the fairshare status of the respective users and projects, A pairwise (suspender/suspendee) job comparison is made and given a numerical value based on these factors. Then a search is made over all suspendable jobs to select the best set of jobs (and hence nodes) based on this job suspendability score. Issues like network topology locality are reflected in this search procedure. To maintain efficient system use, an extra constraint of proper nesting is imposed a 4
5 suspended job must be entirely within the footprint of the suspending job. Of course, these are site specific scheduling goals and implementation details other sites may explicitly favour particular job types over others. Indeed, the NCI scheduler does have the flexibility of preferring jobs of specific users or projects over others but even this is not implemented as some absolute suspension priority. The critical point is that site policy impacts on how jobs are selected for suspension and hence how nodes are allocated to jobs. There are a number of obvious scheduling advantages to using suspension/resumption: there is no need to hold nodes or cpus idle to run parallel jobs they can be run at basically anytime given no other resource conflicts the scheduling algorithm does not introduce a bias toward or against particular job classes. Compare this with, for example, backfilling which favours short jobs of few cpus or the PSC model of draining the whole system once per day to schedule large jobs. The latter approach is to overcome the system fragmentation that always result from production use but it a) wastes a lot of cputime and b) cannot support jobs that are unable to checkpoint in that interval short debugging and testing jobs can be supported without reserving nodes There are, of course, disadvantages: possible excessive paging see previous section having too many suspended jobs and too few queued jobs can lead to situations where there are idle nodes despite having plenty of jobs on the system. Limiting job suspension when queues are short and providing a user-assisted job migration mechanism can often avoid this scenario. some users seem to prefer that jobs reside in queued state rather than suspended state. A little education is sometimes required to convince them that it is only the sum of the time spent in either of these two states that needs to be minimized and that preemptive scheduling is, on average, reducing that time. 7. Interaction with external resource management systems A number of proprietary high performance interconnects and message passing systems include some form of resource manager that is intimately tied to the MPI system. The resource manager typically spawns and manages MPI tasks across all nodes allocated to the job as well as providing any necessary privileged access to devices or mappings. Often the resource manager includes some sort of basic scheduling and node allocation functionality and will respond to requests from any user. To work with ANUPBS, the resource manager should a) support a mode of operation where only privileged processes can cause resource manager actions and b) within that mode, provide an API that allows a privileged process to: 1. provide an unprivileged user's job with access to the interconnect and message passing system 2. specify the cpus and/or nodes allocated to a job 3. suspend all processes in a job and make the cpus allocated to that job available to another 4. reattach a suspended job to its allocated cpus and resume the job 5. send a specified signal to all processes of a job The MPI library also needs to support suspend-resume actions by the resource manager, e.g. timeouts should be appropriately guarded. A quality resource manager will create a job container of the cpus/nodes allocated that a) all job processes are confined to and b) persists between multiple invocations of mpirun within a job. The resource manager or the associated MPI library should also support both process-to-cpu binding and, on NUMA nodes, process-to-numa-memory binding for the tasks of MPI jobs. Two resource management systems providing this functionality and that have been successfully integrated with ANUPBS are the Quadrics Resource Management System (RMS) and SGI Array Services and Message Passing Toolkit. Many open source MPI systems such as MPICH, LAM and Open MPI either are, or can be made, PBS-aware in the sense that they use PBS directly as their native resource manager and job launcher. 5
6 8. /jobfs scratch disk This section is included as an illustration of a feature not typically found in a job management system but which infiltrates all levels of the job management process. One of the largest impediments to efficient utilization of the NCI system is poor IO practices by users accessing global filesystems in many cases, frequent metadata dominated IO requests lead to greatly diminished IO performance for all jobs. Users are requested to utilize node local disks as much as possible and a large majority of the jobs running on the system now do so. Clearly, node local disk space must be carefully (and strictly) managed if it is to be a reliable job resource. On the NCI NF system: virtually all nodes are configured with a large /jobfs partition dedicated to job use only during job lifetime users must request the amount of this disk space required in their job submission the ANUPBS scheduler carefully allocates jobfs resources at the per-node level the ANUPBS node daemon: creates a writable /jobfs subdirectory for the job at job startup and adds environment variables to the job environment for access monitors the size of the /jobfs subdirectory while jobs are running terminating jobs that exceed requested usage cleans up the directory on job termination utilities are provided to transfer files to and from /jobfs and to monitor and access the filesystem interactively while a job is running A persistent node allocation for the lifetime of the job is essential for moving data to and from /jobfs subdirectories outside MPI program execution in distributed jobs. 9. System management A job management system must, of course, interact closely with all other aspects of system management. ANUPBS offers a number of features that specifically enhance this interaction including: draining specific nodes for some specified future time for system maintenance such as hardware or software updates suspending all (or a subset of) jobs to perform tasks such as updating and rebooting Lustre servers, correcting a component of the system interconnect or running diagnostic or verification tests. controlling jobs when external resources such as mass storage or database servers are scheduled for downtime or having problems. In addition, ANUPBS provides generic interfaces to external usage accounting systems and, in particular, supports the functionality of ANU's very sophisticated project- and shares-based accounting system, RASH. Indeed, the political context of NCI requires a sophisticated hierarchical shares based model. ANUPBS has evolved to providing shareholders the ability to select their own scheduling policies and to compose those policies within a hierarchical share model. In this context, suspend-resume is more an instrument for achieving 10. Summary The following list of main features summarizes the model used: job suspension/resumption is the critical to running the workload mix we are presented with to fully utilize the system, jobs share nodes constantly either running side-by-side each on a subset of node cpus and memory or with one job running on all cpus of the node while other jobs are suspended but still resident on the node to ensure jobs get reliable performance, all resource usage is carefully monitored and limited to the amounts requested by the job the scheduler is responsible for placing jobs on nodes such that resources should not be overcommitted in a scheduling/allocation sense and node execution daemons (MOMs) are responsible for ensuring resources are not overcommitted in an actual usage sense (i.e. the 6
7 scheduler ensures resources should not be over-allocated while the MOMs ensure resources are not over-allocated) the selection of nodes to be allocated to a job involves site policy and, hence, is the responsibility of the site scheduler. 7
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
More informationBatch Systems. provide a mechanism for submitting, launching, and tracking jobs on a shared resource
PBS INTERNALS PBS & TORQUE PBS (Portable Batch System)-software system for managing system resources on workstations, SMP systems, MPPs and vector computers. It was based on Network Queuing System (NQS)
More informationA High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationJob Scheduling on a Large UV 1000. Chad Vizino SGI User Group Conference May 2011. 2011 Pittsburgh Supercomputing Center
Job Scheduling on a Large UV 1000 Chad Vizino SGI User Group Conference May 2011 Overview About PSC s UV 1000 Simon UV Distinctives UV Operational issues Conclusion PSC s UV 1000 - Blacklight Blacklight
More informationGrid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
More information- An Essential Building Block for Stable and Reliable Compute Clusters
Ferdinand Geier ParTec Cluster Competence Center GmbH, V. 1.4, March 2005 Cluster Middleware - An Essential Building Block for Stable and Reliable Compute Clusters Contents: Compute Clusters a Real Alternative
More informationBuilding Docker Cloud Services with Virtuozzo
Building Docker Cloud Services with Virtuozzo Improving security and performance of application containers services in the cloud EXECUTIVE SUMMARY Application containers, and Docker in particular, are
More informationOpenMosix Presented by Dr. Moshe Bar and MAASK [01]
OpenMosix Presented by Dr. Moshe Bar and MAASK [01] openmosix is a kernel extension for single-system image clustering. openmosix [24] is a tool for a Unix-like kernel, such as Linux, consisting of adaptive
More informationOperatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationScheduling and Resource Management in Computational Mini-Grids
Scheduling and Resource Management in Computational Mini-Grids July 1, 2002 Project Description The concept of grid computing is becoming a more and more important one in the high performance computing
More informationPARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER An Introduction to Operating System Virtualization and Parallels Cloud Server 1 Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating System Virtualization...
More informationMoving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage
Moving Virtual Storage to the Cloud Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage Table of Contents Overview... 1 Understanding the Storage Problem... 1 What Makes
More informationJob Scheduling with Moab Cluster Suite
Job Scheduling with Moab Cluster Suite IBM High Performance Computing February 2010 Y. Joanna Wong, Ph.D. yjw@us.ibm.com 2/22/2010 Workload Manager Torque Source: Adaptive Computing 2 Some terminology..
More informationOperating Systems, 6 th ed. Test Bank Chapter 7
True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one
More informationMicrosoft HPC. V 1.0 José M. Cámara (checam@ubu.es)
Microsoft HPC V 1.0 José M. Cámara (checam@ubu.es) Introduction Microsoft High Performance Computing Package addresses computing power from a rather different approach. It is mainly focused on commodity
More informationCloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper. www.parallels.
Parallels Cloud Server White Paper An Introduction to Operating System Virtualization and Parallels Cloud Server www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating
More informationNew Issues and New Capabilities in HPC Scheduling with the Maui Scheduler
New Issues and New Capabilities in HPC Scheduling with the Maui Scheduler I.Introduction David B Jackson Center for High Performance Computing, University of Utah Much has changed in a few short years.
More informationVirtual Machines. www.viplavkambli.com
1 Virtual Machines A virtual machine (VM) is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationResource Scheduling Best Practice in Hybrid Clusters
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Resource Scheduling Best Practice in Hybrid Clusters C. Cavazzoni a, A. Federico b, D. Galetti a, G. Morelli b, A. Pieretti
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationRunning VirtualCenter in a Virtual Machine
VMWARE TECHNICAL NOTE VirtualCenter 2.x Running VirtualCenter in a Virtual Machine Running VirtualCenter in a virtual machine is fully supported by VMware to the same degree as if it were installed on
More informationScheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum
Scheduling Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Scheduling Introduction to Scheduling (1) Bursts of CPU usage alternate with periods
More informationWork Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015
Work Environment David Tur HPC Expert HPC Users Training September, 18th 2015 1. Atlas Cluster: Accessing and using resources 2. Software Overview 3. Job Scheduler 1. Accessing Resources DIPC technicians
More informationDistributed Operating Systems. Cluster Systems
Distributed Operating Systems Cluster Systems Ewa Niewiadomska-Szynkiewicz ens@ia.pw.edu.pl Institute of Control and Computation Engineering Warsaw University of Technology E&IT Department, WUT 1 1. Cluster
More informationCloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationHow To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationLS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
More informationGC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems
GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems Riccardo Murri, Sergio Maffioletti Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich
More informationMoving Virtual Storage to the Cloud
Moving Virtual Storage to the Cloud White Paper Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage www.parallels.com Table of Contents Overview... 3 Understanding the Storage
More informationRunning applications on the Cray XC30 4/12/2015
Running applications on the Cray XC30 4/12/2015 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch jobs on compute nodes
More informationThe Importance of Software License Server Monitoring
The Importance of Software License Server Monitoring NetworkComputer Meeting The Job Scheduling Challenges of Organizations of All Sizes White Paper Introduction Every semiconductor design group uses a
More informationManaging Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER
Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Table of Contents Capacity Management Overview.... 3 CapacityIQ Information Collection.... 3 CapacityIQ Performance Metrics.... 4
More informationLSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationEMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
More informationCapacity Estimation for Linux Workloads
Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines
More informationMultilevel Load Balancing in NUMA Computers
FACULDADE DE INFORMÁTICA PUCRS - Brazil http://www.pucrs.br/inf/pos/ Multilevel Load Balancing in NUMA Computers M. Corrêa, R. Chanin, A. Sales, R. Scheer, A. Zorzo Technical Report Series Number 049 July,
More informationComparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
More informationThe Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
More informationBatch Scheduling and Resource Management
Batch Scheduling and Resource Management Luke Tierney Department of Statistics & Actuarial Science University of Iowa October 18, 2007 Luke Tierney (U. of Iowa) Batch Scheduling and Resource Management
More informationIsolating Cluster Jobs for Performance and Predictability
Isolating Cluster Jobs for Performance and Predictability Brooks Davis Enterprise Information Systems The Aerospace Corporation BSDCan 2009 Ottawa, Canada May 8-9, 2009 The Aerospace
More informationBigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
More informationSolution Guide Parallels Virtualization for Linux
Solution Guide Parallels Virtualization for Linux Overview Created in 1991, Linux was designed to be UNIX-compatible software that was composed entirely of open source or free software components. Linux
More informationThe Application Level Placement Scheduler
The Application Level Placement Scheduler Michael Karo 1, Richard Lagerstrom 1, Marlys Kohnke 1, Carl Albing 1 Cray User Group May 8, 2006 Abstract Cray platforms present unique resource and workload management
More informationTechnical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment
Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,
More informationMitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform
Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4
More informationBatch Scheduling on the Cray XT3
Batch Scheduling on the Cray XT3 Chad Vizino, Nathan Stone, John Kochmar, J. Ray Scott {vizino,nstone,kochmar,scott}@psc.edu Pittsburgh Supercomputing Center ABSTRACT: The Pittsburgh Supercomputing Center
More informationVirtualization 101: Technologies, Benefits, and Challenges. A White Paper by Andi Mann, EMA Senior Analyst August 2006
Virtualization 101: Technologies, Benefits, and Challenges A White Paper by Andi Mann, EMA Senior Analyst August 2006 Table of Contents Introduction...1 What is Virtualization?...1 The Different Types
More informationBest Practices for VMware ESX Server 2
Best Practices for VMware ESX Server 2 2 Summary VMware ESX Server can be deployed in many ways. In this document, we recommend specific deployment guidelines. Following these guidelines will maximize
More informationVirtual Private Systems for FreeBSD
Virtual Private Systems for FreeBSD Klaus P. Ohrhallinger 06. June 2010 Abstract Virtual Private Systems for FreeBSD (VPS) is a novel virtualization implementation which is based on the operating system
More informationSurvey on Job Schedulers in Hadoop Cluster
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 1 (Sep. - Oct. 2013), PP 46-50 Bincy P Andrews 1, Binu A 2 1 (Rajagiri School of Engineering and Technology,
More information3 Red Hat Enterprise Linux 6 Consolidation
Whitepaper Consolidation EXECUTIVE SUMMARY At this time of massive and disruptive technological changes where applications must be nimbly deployed on physical, virtual, and cloud infrastructure, Red Hat
More informationFeature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V
Comparison and Contents Introduction... 4 More Secure Multitenancy... 5 Flexible Infrastructure... 9 Scale, Performance, and Density... 13 High Availability... 18 Processor and Memory Support... 24 Network...
More informationDeploying and Optimizing SQL Server for Virtual Machines
Deploying and Optimizing SQL Server for Virtual Machines Deploying and Optimizing SQL Server for Virtual Machines Much has been written over the years regarding best practices for deploying Microsoft SQL
More informationCloud Computing through Virtualization and HPC technologies
Cloud Computing through Virtualization and HPC technologies William Lu, Ph.D. 1 Agenda Cloud Computing & HPC A Case of HPC Implementation Application Performance in VM Summary 2 Cloud Computing & HPC HPC
More informationGeneral Overview. Slurm Training15. Alfred Gil & Jordi Blasco (HPCNow!)
Slurm Training15 Agenda 1 2 3 About Slurm Key Features of Slurm Extending Slurm Resource Management Daemons Job/step allocation 4 5 SMP MPI Parametric Job monitoring Accounting Scheduling Administration
More informationPerformance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
More informationMPI / ClusterTools Update and Plans
HPC Technical Training Seminar July 7, 2008 October 26, 2007 2 nd HLRS Parallel Tools Workshop Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI MPI / ClusterTools Update and Plans Len Wisniewski
More informationWhite Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux
White Paper Real-time Capabilities for Linux SGI REACT Real-Time for Linux Abstract This white paper describes the real-time capabilities provided by SGI REACT Real-Time for Linux. software. REACT enables
More informationAn Oracle White Paper August 2011. Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability
An Oracle White Paper August 2011 Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability Note This whitepaper discusses a number of considerations to be made when
More informationAutomatic Software Updates on Heterogeneous Clusters with STACI
Automatic Software Updates on Heterogeneous Clusters with STACI Michael Shuey Linux Developer and Administrator LCI: The HPC Revolution May 19, 2004 Outline Introduction Common maintenance problems STACI
More informationProvisioning and Resource Management at Large Scale (Kadeploy and OAR)
Provisioning and Resource Management at Large Scale (Kadeploy and OAR) Olivier Richard Laboratoire d Informatique de Grenoble (LIG) Projet INRIA Mescal 31 octobre 2007 Olivier Richard ( Laboratoire d Informatique
More informationParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
More informationPage 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems
Lecture Outline Operating Systems Objectives Describe the functions and layers of an operating system List the resources allocated by the operating system and describe the allocation process Explain how
More informationPEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
More informationAn Oracle White Paper August 2010. Beginner's Guide to Oracle Grid Engine 6.2
An Oracle White Paper August 2010 Beginner's Guide to Oracle Grid Engine 6.2 Executive Overview...1 Introduction...1 Chapter 1: Introduction to Oracle Grid Engine...3 Oracle Grid Engine Jobs...3 Oracle
More information:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD yoel@emet.co.il
:Introducing Star-P The Open Platform for Parallel Application Development Yoel Jacobsen E&M Computing LTD yoel@emet.co.il The case for VHLLs Functional / applicative / very high-level languages allow
More informationTen Reasons to Switch from Maui Cluster Scheduler to Moab HPC Suite Comparison Brief
TM Ten Reasons to Switch from Maui Cluster Scheduler to Moab HPC Suite Comparison Brief Many Maui users make the switch to Moab each year for key scalability, capability and support advantages that help
More informationSAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC
Paper BI222012 SAS Grid: Grid Scheduling Policy and Resource Allocation Adam H. Diaz, IBM Platform Computing, Research Triangle Park, NC ABSTRACT This paper will discuss at a high level some of the options
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationThe Moab Scheduler. Dan Mazur, McGill HPC daniel.mazur@mcgill.ca Aug 23, 2013
The Moab Scheduler Dan Mazur, McGill HPC daniel.mazur@mcgill.ca Aug 23, 2013 1 Outline Fair Resource Sharing Fairness Priority Maximizing resource usage MAXPS fairness policy Minimizing queue times Should
More informationAn Oracle White Paper November 2010. Deploying SAP NetWeaver Master Data Management on Oracle Solaris Containers
An Oracle White Paper November 2010 Deploying SAP NetWeaver Master Data Management on Oracle Solaris Containers Executive Overview...1 Application overview: Oracle Solaris Containers Overview...2 Oracle
More informationAgenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
More informationMOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [mastroserio@na.infn.it] Francesco Maria Taurino [taurino@na.infn.it] Gennaro Tortone [tortone@na.infn.it] Napoli Index overview on Linux farm farm
More informationA Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing
A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South
More informationParallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
More informationMonitoring Microsoft Exchange to Improve Performance and Availability
Focus on Value Monitoring Microsoft Exchange to Improve Performance and Availability With increasing growth in email traffic, the number and size of attachments, spam, and other factors, organizations
More informationSAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs
SAS deployment on IBM Power servers with IBM PowerVM dedicated-donating LPARs Narayana Pattipati IBM Systems and Technology Group ISV Enablement January 2013 Table of contents Abstract... 1 IBM PowerVM
More information2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput
Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?
More informationJoramMQ, a distributed MQTT broker for the Internet of Things
JoramMQ, a distributed broker for the Internet of Things White paper and performance evaluation v1.2 September 214 mqtt.jorammq.com www.scalagent.com 1 1 Overview Message Queue Telemetry Transport () is
More informationOpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
More informationCPU Scheduling Outline
CPU Scheduling Outline What is scheduling in the OS? What are common scheduling criteria? How to evaluate scheduling algorithms? What are common scheduling algorithms? How is thread scheduling different
More informationPBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007
PBS Tutorial Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007 Abstract In this tutorial we gave a brief introduction to using PBS Pro. We gave examples on how to write control script, and submit
More informationCSE 120 Principles of Operating Systems. Modules, Interfaces, Structure
CSE 120 Principles of Operating Systems Fall 2000 Lecture 3: Operating System Modules, Interfaces, and Structure Geoffrey M. Voelker Modules, Interfaces, Structure We roughly defined an OS as the layer
More informationComputing in High- Energy-Physics: How Virtualization meets the Grid
Computing in High- Energy-Physics: How Virtualization meets the Grid Yves Kemp Institut für Experimentelle Kernphysik Universität Karlsruhe Yves Kemp Barcelona, 10/23/2006 Outline: Problems encountered
More information1 Organization of Operating Systems
COMP 730 (242) Class Notes Section 10: Organization of Operating Systems 1 Organization of Operating Systems We have studied in detail the organization of Xinu. Naturally, this organization is far from
More informationScaling LS-DYNA on Rescale HPC Cloud Simulation Platform
Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform Joris Poort, President & CEO, Rescale, Inc. Ilea Graedel, Manager, Rescale, Inc. 1 Cloud HPC on the Rise 1.1 Background Engineering and science
More informationWindows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration Table of Contents Overview of Windows Server 2008 R2 Hyper-V Features... 3 Dynamic VM storage... 3 Enhanced Processor Support... 3 Enhanced Networking Support...
More informationAn Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
More informationManaging a Fibre Channel Storage Area Network
Managing a Fibre Channel Storage Area Network Storage Network Management Working Group for Fibre Channel (SNMWG-FC) November 20, 1998 Editor: Steven Wilson Abstract This white paper describes the typical
More informationMultiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and
More informationCloud Computing Capacity Planning. Maximizing Cloud Value. Authors: Jose Vargas, Clint Sherwood. Organization: IBM Cloud Labs
Cloud Computing Capacity Planning Authors: Jose Vargas, Clint Sherwood Organization: IBM Cloud Labs Web address: ibm.com/websphere/developer/zones/hipods Date: 3 November 2010 Status: Version 1.0 Abstract:
More informationChapter 1 - Web Server Management and Cluster Topology
Objectives At the end of this chapter, participants will be able to understand: Web server management options provided by Network Deployment Clustered Application Servers Cluster creation and management
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationIaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures
IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Introduction
More informationHigh Availability of the Polarion Server
Polarion Software CONCEPT High Availability of the Polarion Server Installing Polarion in a high availability environment Europe, Middle-East, Africa: Polarion Software GmbH Hedelfinger Straße 60 70327
More information