Clusters in the Cloud Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist eresearch SA October 2014
Use Cases Make the cloud easier to use for compute jobs Par:cularly for users familiar with HPC clusters Personal, on- demand cluster in the cloud Cluster in the cloud A private cluster only available to a research group A shared Node- managed cluster Preferably dynamic/elas:c Cloudburs:ng for HPC Dynamically (and transparently) add extra compute nodes from cloud to an exis:ng HPC cluster
Cluster infrastructure Configura:on Management System Monitoring Shared File System Applica:on Distribu:on SoWware Layer Local Resource Management System / Queueing system Hardware, Network
Tradi:onal Sta:c Cluster Hardware/Network Dedicated hardware Long process to get new hardware Sta:c not elas:c SoWware Assumes a fairly sta:c environment (IPs etc) Not cloud- friendly Some systems need restart if cluster is changed Not adaptable to changes
Cluster in the cloud Hardware / Network Provisioned by the cloud (OpenStack) Get new resources in minutes Remove resources in minutes Elas:c/scalable on demand SoWware Dynamic Can easily add/remove nodes as needed
Possible solu:ons Condor for high- throughput compu:ng Cloud Scheduler working for a CERN LCG node Recent versions of Condor support cloud execu:on Torque/PBS sta:c cluster in cloud Works, but painful to set up and maintain Dynamic Torque/PBS cluster No exis:ng dynamic/elas:c solu:on StarCluster for personal cluster Automate setup of VMs in cloud, including cluster Can add/subtract worker nodes manually Only Amazon, SGE and Condor but not PBS/Torque
Our work Condor for high- throughput compu:ng Cloud Scheduler for Australian CERN LCG node Torque/PBS sta:c cluster in cloud Set up large cluster in cloud for CoEPP Scripts to automate setup and monitoring Dynamic Torque/PBS cluster Created Dynamic Torque system for OpenStack StarCluster for personal cluster Ported to OpenStack and added Torque plugin Add- ons to make it easier to use for ersa users
Applica:on sowware Want familiar HPC applica:ons to be available to cloud VMs And we don t want to install and maintain sowware twice, in HPC and cloud But limit on size of VM images in the cloud Want to avoid making lots of custom images We use CVMFS Read- only distributed file system, hbp based Used by CERN LHC Grid for distribu:ng sowware One VM image, with CVMFS client Downloads and caches sowware from HPC cluster
HTC Cluster in the Cloud NeCTAR eresearch Tools project for high- throughput compu:ng in the cloud ARC Centre of Excellence in Experimental Par:cle Physics (CoEPP) Needed a large cluster for CERN ATLAS data analysis and simula:on Tier 2 (global) and Tier 3 (local) jobs Augment exis:ng small physical clusters at mul:ple sites running Torque
CERN ATLAS experiment
CERN ATLAS experiment
Sta:c Cluster in the Cloud Built a large Torque cluster using cloud VMs A challenging exercise! Reliability issues, needed a lot of scripts to automate setup, monitoring, recovery, etc Some types of usage are bursty but cluster resources were sta:c Didn't take advantage of elas:city of cloud
Dynamic Torque Sta:c/dynamic worker nodes Sta:c: stays up all the :me Dynamic: up and down according to workload Independent of Torque/MAUI Runs as a separate process Only add/remove worker nodes Query Torque and MAUI scheduler periodically S:ll up to MAUI scheduler to decide where to run a job
Dynamic Torque
Dynamic Torque for CoEPP Ganglia Nagios CVMFS Torque/MAUI LDAP and Dynamic NFS Puppet Torque Worker nodes in SA Interac:ve nodes in Melbourne Worker nodes in Monash Worker nodes in Melbourne
Dynamic Torque for CoEPP
CoEPP Outcomes Three large clusters in use for over a year Hundreds of cores in each Condor and CloudScheduler for ATLAS Tier 2 Dynamic Torque for ATLAS Tier 3 and Belle LHC ATLAS experiment at CERN 530,000 Tier 2 jobs 325,000 CPU hours for Tier 3 jobs Belle experiment in Japan 150,000 jobs
Private Clusters Good for building a shared cluster for a large research group with good IT support who can set up and manage a Torque cluster What about the many individual researchers or small groups who also want a private cluster using their cloud alloca:on? But have no dedicated IT staff and very basic Unix skills? Is there a simple DIY solu:on?
StarCluster Generic setup Create security group for the cluster Launch VMs (master, node01, node02 ) Set up public key for password- less SSH Install NFS on master and share scratch space to all nodexx Can use EBS (Cinder) volumes as scratch space Queuing system setup (plugins) Condor, SGE, Hadoop and your own plugin!
StarCluster for OpenStack ersa App Repository (CVMFS server) CVMFS proxy Worker Node (Torque MOM) Worker Node (Torque MOM) OpenStack EC2 API Head Node (NFS, Torque server, MAUI) Worker Node (Torque MOM) Volume StarCluster
StarCluster - configura:on Availability zone Image (op:onal) Image for master Flavor (op:onal) Flavor for master Number of nodes Volume Username User ID Group ID User shell plugins
Start a cluster with StarCluster # fire up a new cluster (from your desktop) $ starcluster start mycluster # log in to the head node (master) to submit jobs $ starcluster sshmaster mycluster # Copy files $ starcluster put /path/to/local/file/or/dir /remote/path/ $ starcluster get /path/to/remote/file/or/dir /local/path/ # Add a compute node to the cluster $ starcluster addnode n 2 mycluster # terminate it awer use $ starcluster terminate mycluster
Other op:ons for Personal Cluster Elas:cluster Python code to provision VMs Ansible to configure them Ansible playbooks for Torque/SGE/, NFS/pvfs/ Heat Everything in HOT template Earlier versions had limita:ons that made it hard to implement everything May revisit in future
Private Cluster in the Cloud Can use your personal or project cloud alloca:on to start up your own personal cluster in the cloud No need to share! Except among your group. Can use the standard PBS/Torque queueing system to submit jobs (or not) Only your jobs in the queue But you have to set up and manage the cluster Straighqorward if you have good Unix skills (unless things go wrong ) Several groups now using this But ersa doing support when things go wrong
Emu Cluster in the Cloud Emu is an ersa cluster that runs in the cloud Aimed to be like an old cluster (Corvus) 8- core compute nodes But a bit different Dynamically created VMs in the cloud Can have private compute nodes Different size compute nodes if you want
Emu ersa- managed dynamic cluster in the cloud Shared by mul:ple cloud tenants and ersa users All nodes in SA zone ersa cloud alloca:on contributes 128 cores Users can bring in their own cloud alloca:on to launch their worker nodes in Emu Users don t need to build and look awer their own personal cluster It can also mount users Cinder volume storage to their own worker nodes via NFS Set up so researchers use their ersa accounts
Using your own cloud alloca:on Users add our sysadmins to their tenant So we can launch VMs on their behalf Will look at Trusts in Icehouse Add some configs to Dynamic Torque Number of sta:c/dynamic nodes, size of nodes, etc Add a group of user accounts allowed to use it Create a reserva:on for users worker nodes in MAUI A special account string needs to be put in the job To match users jobs to their group s reserved nodes A qsub filter to check if the account string is valid You can t submit a job using another group s alloca:on
Emu Sensu CVMFS Torque/MAUI LDAP and Dynamic NFS Salt Torque NFS Worker nodes of Tenant1 Shared worker nodes (ersa donated) Worker nodes of Tenant2
Sta:c cluster vs dynamic cluster Sta$c Cluster Dynamic Cluster Hardware Physical Machines Virtual Machines LRMS Torque Torque with Dynamic Torque CMS Puppet Salt Stack Monitoring Nagios, Ganglia Sensu, Graphite, Logstash App Distribu:on NFS Mount CVMFS Shared FS NFS NFS
Future Work Beber repor:ng and usage graphs More monitoring checks Queueing system Mul:- node jobs don t work because a new node is not trusted by exis:ng nodes Trust list is only updated when PBS server is started Could hack Torque source code Or maybe use SLURM or SGE Beber way to share user creden:als Trusts in Icehouse? Na:onal and/or other regional services
Future Work Spot instance queue Distributed file system NFS is the sta:c component Cannot add storage to NFS without stopping it Cannot add new nodes to the allow list dynamically needs to use iptables; update iptables instead Inves:ga:on of a dynamic and distributed FS One FS for all tenants Alterna:ves to StarCluster Heat or Elas:cluster
Resources Cloud Scheduler hbp://cloudscheduler.org/ Star cluster hbp://star.mit.edu.au/cluster OpenStack version hbps://github.com/shundezhang/starcluster/ Dynamic Torque hbps://github.com/shundezhang/dynamictorque
Nagios vs Sensu Nagios First designed in the last century for sta:c environment Needs to update local configura:on and restart service if a remote server is added or removed Server perform all checks and it is not scalable Sensu Modern design with AMQP as communica:on layer Local agent runs checks Weak coupling between clients and server and it is scalable
Imagefactory Template in XML Packages Commands Files Can backup an image in github Automa:c, no user interac:on required
EMU Monitoring Sensu Run health checks Logstash Collect PBS server/mom/accoun:ng logs Collectd Collect metrics of CPU, memory, disk, network etc
Salt Stack vs Puppet Salt Stack Puppet Architecture Server- Client Server- Client Working Modal Push Pull Communica:on Zeromq + msgpack HTTP + text Language Python Ruby Remote execu:on Yes No