Empowering Cloud with latest VMDIRAC improvements DIRAC User Workshop, Ferrara, 2015 Víctor Méndez Muñoz vmendez@uab.es
Agenda A single context to drive them all Cloud flexibility: Hybrid and multi-layered Use Case: S3 1K genome processing Conclusions 2 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to drive them all VMDIRAC v1r3 -April 2015- includes a single cloudinit method for OpenStack OpenNebula AWS A RunningPod will bind Job Requirements to be allocated in the Vms + A DIRAC Image + List of cloud endpoints DIRAC Image in the Configuration System, includes the Image contextualization Cloud endpoint should be setup in the DIRAC Configuration 3 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to RunningPod setup: Resources VirtualMachines RunningPods drive them all minimal overall CPU of the DIRAC jobs waiting in the task queued to submit a new VM, elasticity: 0 no minimal CPU in tasks queue, best total wall time Average required CPU compromise between VM efficiency and total wall time A very large value to maximize the efficiency in terms of VM creation overhead Requirements you want to put in /opt/dirac/etc/dirac.cfg in the Vms Will be matched with the jobs 4 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
DIRAC Image setup Resources VirtualMachines Images A single context to drive them all Your golden image name in the IaaS provider Flavor = Instance Type, depending on your IaaS provider If MaxAllowedPrice = 0 or not definen then not spot instances are used KeyName in your IaaS to push public key in the Vms (deploying / debuggin matters) 5 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to drive them all DIRAC Image setup: Single cloudinit contextualization Resources VirtualMachines Images VM host cert/key DIRAC/VMDIRAC agents to push place in the VM Your CVMFS client contextualization script Your cloudinit static script + All Image/Endpoint setup cloudinit booting VM 6 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to drive them all Cloud endpoint setup: AWS Resources VirtualMachines CloudEndpoints RunningPods CPUPerInstance workload Vs running Static will create maxinstances (RunningPod, CloudEndpoints) Your VMMonitor agent definition (MarginTime, MinAverage, PollingTime) AccessKey SecretKey hidden in server 7 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to drive them all Cloud endpoint setup: OpenStack Resources VirtualMachines CloudEndpoints 8 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
A single context to drive them all Cloud endpoint setup: OpenNebula Resources VirtualMachines CloudEndpoints 9 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Cloud flexibility: Hybrid and multi-layered Hybrid Cloud Usual Categorization: Public and Private nature of IaaS resources Public is any Cloud ownered by third party Private Cloud is a user infrastructure Other Categorizations: Community Cloud: Is ownered and / or managed by a particular user comunity. Public: like some of EGI Fedcloud resources ownered by providers which are EGI members Private Hybrid: f.e. public AWS + private OpenStack 10 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Cloud flexibility: Hybrid and multi-layered Multi-layered Cloud Taking advantage of pre-procesing / post-processing to filter data out of the main processing in public payper-use Clouds Example: Input data Filtered Sensors: Input data CometCloud mobile data agregation in multi-layered cloud http://nsfcac.rutgers.edu/cometcloud/sites/nsfcac.rutgers.edu.com etcloud/files/petri-diaz-ficloud14-cm.pdf 11 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Use Case: S3 1k genome processing Indivual mapping of breakpoint junctions Breakseq provides genome processing, allowing mapping of reads with breakpoint junctions Human individuals specific mapping: Data Patterns: Homo sapiens raw genome: human_g1k_v37.fasta Homo sapines aligned genome sequence:.bt2 Breakpoint junctions of interest: bplib.fa Input Data Indivuals to be mapped: reads.fastq Output Data Filtered allele inverted :.uni Filtered reference allele:.xun 12 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Use Case: S3 1k genome processing Aligned index hg19.bt2 ftp.ccb.jhu.edu human_1k _h37.fasta ftp.1000genom es.ebi.ac.uk Workflow Logical Layer reads.fastq *.uni breakseq *.xun makebam.sh (awk + samtools) Indiviudal.bam bplib.fa FilterUniXun.sh *.uni *.xun Processing Post-Processing 13 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Use Case: S3 1k genome processing Amanzon S3 s3://1000genomes O(TB) read_s3_refs bplib.fa hg19.bt2 3.5GB ftp.ccb.jhu.edu AWS Processing ~15 MB/s DIRAC Image Management ~25 MB/s reads.fastq *.uni breakseq *.xun bplib.fa Amazon EC2 DIRAC SE + DFC lfn://individuals-ibb/* O(MB) User Transparency 14 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Use Case: S3 1k genome processing DIRAC SE + DFC lfn://individuals-ibb/* O(MB) read_lfn_refs bplib.fa human_1k _h37.fasta 3 GB ftp.1000genom es.ebi.ac.uk 0.5 MB/s DIRAC Image Management OpenStack Post-Processing *.uni *.xun makebam.sh (awk + samtools) Indiviudal.bam FilterUniXun.sh *.uni *.xun bplib.fa Private Cloud User Transparency 15 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Image management Use Case: S3 1k genome processing hg19.bt2 3.5GB ftp.ccb.jhu.edu hg19 Golden Image Cloud Golden Image human_1k _h37.fasta 3 GB ftp.1000genom es.ebi.ac.uk DIRAC + VMDIRAC hg19 boot deb data + sw deb Breakseq SW Stack VMDIRAC Image Context VMDIRAC Endpoint Config Running VM Breakseq Stack setup VMDIRAC SW Stack Amazon setup OpenStack setup OpenNebula setup 16 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
Coclusions Cloudinit allows a single Image contextualization script for all your IaaS providers Cloudinit enables straight forward integration of the pilot 2.0, cloudinit script = pilot 2.0 script VMDIRAC setup allows hybrid cloud and multi-layer cloud setup to optimize computational costs and comercial cloud prices Everithing is setup and managered by dirac administrator, enabling user transparency in the use of cloud resources 17 Empowering Cloud with latest VMDIRAC improvements 27/05/2015
References Please, cite related papers in the link: https://scholar.google.com/citations?user=2rvfzrwaaaaj&hl=es Thanks 18 Empowering Cloud with latest VMDIRAC improvements 27/05/2015