Dynamic Resource Provisioning with HTCondor in the Cloud Ryan Taylor Frank Berghaus 1
Overview Review of Condor + Cloud Scheduler system Condor job slot configuration Dynamic slot creation Automatic slot defragmentation Target Shares Flexible use of cloud resources 2
Be Dynamic Do the right thing... On any type of cloud or hypervisor Nimbus, Openstack, KVM, Xen Anywhere in the world Data Federations Squid Federator: Shoal For any kind of job SCORE MCORE HIMEM New requirement... using just one image, and minimizing static configuration Old requirement 3
Cloud Scheduler A python package for managing VMs on IaaS clouds Users submit HTCondor jobs Optional attributes specify VM properties Dynamically manages quantity & type of VMs in response to user demand Easily connect to many IaaS clouds, aggregate resources, present them as HTCondor batch system Used by ATLAS, Belle II, CANFAR, BaBar https://github.com/hep gc/cloud scheduler Publication in CHEP 2010 Proceedings 4
VMs boot, join Condor pool Contextualized with x509 credential for authentication VMs run jobs continuously CS automatically retires unneeded VMs User Job Pilot Job 5
VM Image ucernvm: OS loaded on demand via CVMFS 10 MB image (kernel + CVMFS client) Contextualization: Cloud init and puppet Configure Condor, CVMFS, Shoal, grid environment, etc. https://github.com/berghaus/atlasgce modules Developed by Frank Berghaus and Henric Öhman Instrumented with Ganglia monitoring agm.cern.ch 6
ATLAS Production in the Cloud (Jan Sep 2014) Helix Nebula Also running 10% of Belle II jobs in the cloud with Dirac Google 1.2M Completed Jobs in 2014 CERN North America Australia UK Jan 2014 Sep 2014 Frank Berghaus Over 1.2M ATLAS jobs completed Single core and multi core production 7
Condor Slot Configuration Static slots: 8 1 core slots (or 1 8 core slot) NUM_SLOTS = 8 SLOT1_USER = slot01 old way SLOT2_USER = slot02... Dynamic slots NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = cpus=100% SLOT_TYPE_1_PARTITIONABLE = True SLOT1_1_USER = slot01 new way SLOT1_2_USER = slot02... 8
Dynamic Slot Creation Jobs specify resource requirements request_cpus = 2 request_memory = 8000 request_disk = 40000000 # MB # KB Condor dynamically creates a slot Problem: VMs get fragmented into smaller slots Larger jobs can't run Solution: DEFRAG daemon coalesces slots together again 9
Automatic Slot Defragmentation Currently being tested/tuned. Thanks to Andrew Lahiff from RAL for sample configuration DAEMON_LIST = DEFRAG DEFRAG_INTERVAL = 600 DEFRAG_DRAINING_MACHINES_PER_HOUR = 10.0 DEFRAG_MAX_CONCURRENT_DRAINING = 30 DEFRAG_MAX_WHOLE_MACHINES = 50 DEFRAG_SCHEDULE = graceful DEFRAG.SETTABLE_ATTRS_ADMINISTRATOR = DEFRAG_MAX_CONCURRENT_DRAINING,DEFRAG_DRAINING_MACHINES_PER_HOUR,DEFRAG_ MAX_WHOLE_MACHINES ENABLE_RUNTIME_CONFIG = TRUE DEFRAG_RANK = ifthenelse(cpus >= 8, -10, (TotalCpus - Cpus)/(8.0 Cpus)) 10
Target Shares Use Condor groups to prioritize job types GROUP_NAMES = group_analysis, group_production GROUP_QUOTA_DYNAMIC_group_analysis = 0.05 GROUP_QUOTA_DYNAMIC_group_production = 0.95 GROUP_ACCEPT_SURPLUS = True In the job definition add: AccountingGroup = "group_production" or AccountingGroup = "group_analysis" Thanks to Joanna Huang and Sean Crosby from the Australian ATLAS group 11
Flexible Cloud Resources CERN Cloud Scheduler cloudscheduler.cern.ch Dynamically provisioned slots GridPP cloud group CERN cloud group 12
Flexible Cloud Resources GridPP_CLOUD GridPP_CLOUD_HIMEM request_cpus = 1 TargetClouds = GridPP request_cpus = 1 TargetClouds = CERN CERN-Prod_CLOUD request_memory = 8000 TargetClouds = GridPP request_cpus = 8 TargetClouds = CERN CERN-Prod_CLOUD _MCORE To run jobs with arbitrary resource requirements: just submit them GridPP cloud group Automatic No dedicated nodes CERN cloud group 13
Dynamic Resource Provisioning aka How to provision MCORE resources using IaaS 1)Connect IaaS cloud to Cloud Scheduler 2)Create Panda queue 3)Compose JDL 4)Configure Pilot Factory 5)Profit 14
Questions Bonus Material 15
Credential Transport User Securely delegates user credentials to VMs, and authenticates VMs joining the Condor pool.
Security Configuration Worker GSI_DAEMON_DIRECTORY = /etc/grid security GSI_DAEMON_CERT = /etc/grid security/hostcert.pem GSI_DAEMON_KEY = /etc/grid security/hostkey.pem Central Manager SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION_METHODS = GSI SEC_DEFAULT_ENCRYPTION = REQUIRED SEC_DEFAULT_ENCRYPTION_METHODS = 3DES SEC_DEFAULT_INTEGRITY = REQUIRED GRIDMAP = /etc/grid security/grid mapfile GSI_DAEMON_CERT = /etc/grid security/condor/condorcert.pem GSI_DAEMON_KEY = /etc/grid security/condor/condorkey.pem GSI_DAEMON_TRUSTED_CA_DIR = /etc/grid security/certificates/ GSI_DELEGATION_KEYBITS = 1024 GSI_DAEMON_NAME = /C=CA/O=Grid/CN=condor/condor.heprc.uvic.ca, /C=CA/O=Grid/CN=condorworker/condor.heprc.uvic.ca
Virtual Machine Virtual Machine Cloud Interface Cloud Interface... Virtual Machine Virtual Machine GCE Cloud Interface OpenStack OpenStack Scheduler status communication Cloud Scheduler Virtual Machine Virtual Machine Condor Central Manager User 18
Shoal: Squid Cache Federator Jun. 2014 S&C Week Shoal tracks squids discover new ones remove missing ones Workers find best squids Build a configuration less mesh instead of statically defined site specific mappings CHEP 2013 Poster github.com/hep gc/shoal WLCG Proxy Discovery TF 20140605 Ryan Taylor - ATLAS Cloud Computing R &D 19
Dynamic Federation Sep. 2014 S&C Workshop High performance aggressive metadata caching in RAM maximal concurrency scalable ~ 10^6 hit/s per core 24 GB of RAM for metadata cache is enough for ~100 PB of data in end points Well designed stateless, no persistency standard components and protocols, not HEP specific simple, lightweight trivial to add endpoints no site action needed! Dynamic automatically detect and avoid offline endpoints federating is done on the fly 10/27/14 Ryan Taylor - ATLAS S&C Week 20