Experience with Preemption for Urgent Computing Jason Hedden, Joseph Insley, Ti Leggett, Michael E. Papka UChicago/Argonne TeraGrid Resource Provider
TeraGrid TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computation resource. Collection of high-performance networks high-performance computers data resources and tools high-end experimental facilities Providing more than 102 teraflops of computing capability more than 15 petabytes of online and archival data storage over 100 discipline-specific databases Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric Research.
TeraGrid
96 Visualization Nodes Dual Intel Xeon IA-32 Processors 4GB Memory each GeFORCE 6600GT AGP Graphics Cards 64 Compute Nodes Dual Intel Itanium2 IA-64 Processors 4GB Memory each UChicago/ Argonne 4TB of Disk for Home Directories 16TB of Disk for Parallel-IO - Temporary Storage High Performance Myrinet Interconnect High Performance Gigabit Ethernet 2 Visualization Login Nodes Dual Intel Xeon IA-32 Processors (4GB Memory each) GeFORCE 6600GT AGP Graphics Cards 2 Compute Login Nodes Dual Intel Itanium2 IA-64 Processors 4GB Memory each
Technology and Social Barriers Technology Technology needed to deliver on-demand computing in a rapid, reliable, and routine manner. Investigate how smaller sites can contribute to a mission of improving science and engineering. Social Investigate how smaller sites can contribute to a mission of improving science and engineering. Understand the techniques/incentives needed to promote continued use.
Policy Change Notification of additional use of resource Control based on SPRUCE tokens Prioritization of jobs Next to run jobs (no preemption) Run immediately (preemption) Explanation of incentives for using Alternative charging model(s) Discount for jobs that are not preempted No charge for preempted jobs
Torque and Moab Torque is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community based effort based on the original Portable Batch System (PBS) project and has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by many leading edge HPC organizations including TeraGrid. Moab is a closed source advanced job scheduler that integrates with Torque. It is a highly optimized and configurable tool capable of supporting an array of scheduling policies, dynamic priorities, extensive reservations and fairshare capabilities. Both of these tools are developed, and supported by Cluster Resources Inc. www.clusterresources.com
Torque Modifications Torque spruce queue configuration Qmgr: create queue spruce queue_type=execution Qmgr: set queue spruce started=true Qmgr: set queue spruce enabled=true Torque submit filter (qsub wrapper.) Contains spruce and local site specific code. Reject jobs submitted directly to the spruce queue. Verify the user identity and requested resources. Assign urgency priority level. Torque accounting filter Discount jobs submitted on the IA64 resources. Move preempted job information to local logs. Alert local operations of any jobs submitted to the urgent spruce queue
Moab Modifications Set preemption policy to CANCEL, REQUEUE, or CHECKPOINT. CANCEL causes jobs to be deleted and removed from the queue. PREEMPTPOLICY CANCEL Moab manages the urgency level of incoming jobs through Quality of Service configurations. Preemption is used to guarantee resources to critical jobs by removing the lowest priority best fit jobs. The priority value is a dynamic value that can be configured to adjust the value based on time in queue, requested resources, and user, or group accounts. QOSCFG[red] QOSCFG[orange] QOSCFG[yellow] QOSCFG[default] QFLAGS=PREEMPTOR PRIORITY=1000000 QFLAGS=PREEMPTEE PRIORITY=10000 QFLAGS=PREEMPTEE PRIORITY=5000 QFLAGS=PREEMPTEE PRIORITY=1 Assign Quality of Service configurations to the spruce and default Torque queues. CLASSCFG[spruce] CLASSCFG[dque] QDEF=yellow QLIST=orange,red QDEF=default Enable logging of preemption events. RECORDEVENTLIST JOBPREEMPT
Process of Preemption Activate Token UChicago Argonne SPRUCE Web services Job fails no submit filter verify user, resource, time, urgency level, etc. yes Job submission (qsub or Grid) torque moab urgency level
Tornado Season Partnership with LEAD and Spruce Currently testbed for April 1st - June 1st 4 tokens activated for 72 hours each ~30 test runs in preemption mode Ready for production use, using tokens integrated with LEAD portal/gateway
Experience Last week we saw 68.4% utilization of the machine Noticed it but not an issue Was preempted handful of time Monday, maybe
Future (joint) Work How-to guide on what is needed for us to auto restart of preempted users Restart of preempted job Restart from checkpoint file Support for next to run tokens Flexible charging structure Network reservation (bandwidth reservation) Coupling analysis/visualization By arranging reservations for needed resources
Acknowledgement TeraGrid team at UChicago/Argonne Spruce team at UChicago/Argonne This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357, and in part by the National Science Foundation under grant OCI-0504086.