Integration of Virtualized Workernodes in Batch Queueing Systems, Dr. Armin Scheurer, Oliver Oberst, Prof. Günter Quast INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK FAKULTÄT FÜR PHYSIK KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
(Computer) Virtualization Sharing resources of one physical machine between independent Operating Systems (OS) in Virtual Machines (VM) Virtual Machines are decoupled from the underlying hardware and (almost) arbitrary operating systems can be installed. Different virtualization techniques provided by various vendors and open-source communities 2 VM server 2 Proxy server VM server 1 Workernode OS 1 VM server 3 User portal OS 3 Virtualization Physical host machine VM server 1 Workernode OS 1 VM server 2 Proxy server OS 2 VM server 3 User portal OS 3
Why Virtualization? Offers independence from host systems and encapsulation of user interaction. Enables use of special validated operating systems for high energy physics analysis Enables use of Virtual Appliances, e.g. CernVM (see later) Allows the dynamic partitioning of a shared HPC cluster: Grants different setups for different user groups No incompatibilities have to be considered High flexibility 3
- Kernel-based Virtual Machine KVM is implemented as a Kernel module Linux kernel is the virtual machine monitor VMs run as normal processes Supports native virtualization techniques AMD-V and Intel VT-x => Very good performance! Hardware Normal user processes Linux Kernel VM 1 Debian VM 2 SuSe module libvirt Virtualization API Interface to common VMMs/hypervisors such as KVM, Xen, Vmware, UML (Remote) management of virtual machines and storage. More Information: http://libvirt.org 4
Dynamic Virtualization Project at KIT: HPC Cluster Models Group A Group B Group C Isolated Computing Cluster Each group/institution has sep. cluster Administration overhead Can not cover peak loads Shared Computing Cluster All groups share one cluster Setup compromise not always possible Load-balancing by fair-share Dynamic Partitioned Cluster Configure cluster in real-time with VMs Allows any software/os configuration Virtualization layer hidden Load-balancing by fair-share 5
Dynamic Virtualization Project at KIT: ViBatch Lightweight tool enabling virtualization of job environments Can be implemented into arbitrary batch systems Batch system is not aware of the virtualization no code modification needed (only adapt configuration) Virtual environment is determined per job just by the queue the job is sent to: qsub -q [normal_queue] job1.sh qsub -q [virtual_queue] job1.sh job submission: only queue changes! Torque PBS 6
ViBatch - Workflow 7
ViBatch - Lightweight Core components: just bash scripts ( prologue, epilogue and remoteshell ) Additional scripts for (almost) automatic installation on arbitrary clusters Cluster information and preferences in one config-file 8 Logfiles enable debugging and workload statistics.
ViBatch - Virtual Appliances: CernVM-FS Our VM image includes CernVM-FS, which is a remote file system via HTTP developed by CernVM Software appliance http://cernvm.cern.ch/portal Provides LHC software installation (various VOs: CMS, ATLAS,...) including most common versions of experiment software We don't have to care about own installations! A simple Squid HTTP proxy server does the caching Easy to install with yum package manger 9
ViBatch in Operation at EKP, KIT Load of ViBatch (last 6 weeks) ViBatch has already been used at EKP for several HEP analysis: Data Skims for Higgs TauTau analysis (see talk A. Burgmeier, T49.7) Running on EKP production cluster in parallel to native job submission # jobs Monte-Carlo generation for studies in Higgs search (C. Hackstein, T49.1) Performance 10 Monte Carlo Sim. (vbfnlo) native virtual CPU benchmark whetstone native virtual CMSSW physics analysis native: not available virtual +17 % +12 % Depends on KVM tuning and host setup Currently investigated and tuned (KSM,...) SLE11 not binary compatible with CMSSW
ViBatch in Operation at EKP, KIT Our setup characteristics & problems Memory consumption ~ 2GB RAM per VM Currently no InfiniBand driver for our VMs => No native use of Lustre file system possible Storage mounted via NFS export Shared Institutscluster IC1 at KIT Workernodes (EKP) 200 (25) CPU 8x2.66 GHz Intel Xeon Memory 2 GB RAM per core Disc space 750 GB per node Storage 350 TB Lustre FS Network 40 Gbit/s InfiniBand Problems with compatibility kernel space NFS daemon Lustre driver: Unstable, few nodes crashed Currently solved using user space NFS daemon 11
Conclusion and outlook Extend operation to the whole cluster (200 nodes 1600 VM slots) Provide detailed documentation Further simplify installation Burst into cloud: Connect with ROCED (Talk S. Riedel, T 77.3) ViBatch Cloud + ROCED 12