Potential of Virtualization Technology for Long-term Data Preservation



Similar documents
CernVM Online and Cloud Gateway a uniform interface for CernVM contextualization and deployment

CONDOR CLUSTERS ON EC2

LSKA 2010 Survey Report I Device Drivers & Cloud Computing

An Efficient Use of Virtualization in Grid/Cloud Environments. Supervised by: Elisa Heymann Miquel A. Senar

Deployment of Private, Hybrid & Public Clouds with OpenNebula

Building a Volunteer Cloud

Distributed Database Access in the LHC Computing Grid with CORAL

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Eucalyptus User Console Guide

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

OGF25/EGEE User Forum Catania, Italy 2 March 2009

VIRTUAL MACHINE LOGBOOK

2) Xen Hypervisor 3) UEC

Shoal: IaaS Cloud Cache Publisher

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Veeam Best Practices with Exablox

UForge 3.4 Release Notes

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Open Source Cloud Computing Management with OpenNebula

Deploying Baremetal Instances with OpenStack

Symantec Backup Exec 2014 Icon List

OpenStack Introduction. November 4, 2015

Repoman: A Simple RESTful X.509 Virtual Machine Image Repository. Roger Impey

OpenNebula Open Souce Solution for DC Virtualization

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Xcode Project Management Guide. (Legacy)

Deploying Business Virtual Appliances on Open Source Cloud Computing

Automated Configuration of Open Stack Instances at Boot Time

OpenNebula Open Souce Solution for DC Virtualization

Citrix Training. Course: Citrix Training. Duration: 40 hours. Mode of Training: Classroom (Instructor-Led)

IBM Endpoint Manager Version 9.2. Patch Management for SUSE Linux Enterprise User's Guide

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

StorReduce Technical White Paper Cloud-based Data Deduplication

Single Sign-On Framework in Tizen Contributors: Alexander Kanavin, Jussi Laako, Jaska Uimonen

TZWorks Windows Event Log Viewer (evtx_view) Users Guide

Data Centers and Cloud Computing

SMB in the Cloud David Disseldorp

The dcache Storage Element

Migrating to vcloud Automation Center 6.1

About the VM-Series Firewall

Jitterbit Technical Overview : Salesforce

Installation and Configuration Guide

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

ANDROID BASED MOBILE APPLICATION DEVELOPMENT and its SECURITY

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

How To Image A Single Vm For Forensic Analysis On Vmwarehouse.Com

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Trends in Application Recovery. Andreas Schwegmann, HP

HTTP-FUSE PS3 Linux: an internet boot framework with kboot

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

Automated Deployment of Oracle RAC Using Enterprise Manager Provisioning Pack

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Proposal for Virtual Private Server Provisioning

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

Gladinet Cloud Backup V3.0 User Guide

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

OpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013

9/26/2011. What is Virtualization? What are the different types of virtualization.

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

SwiftStack Filesystem Gateway Architecture

Why is a good idea to use OpenNebula in your VMware Infrastructure?

Building Scalable Applications Using Microsoft Technologies

Software Deployment and Configuration

MySQL and Virtualization Guide

GETTING STARTED WITH PROGRESS AMAZON CLOUD

THE CC1 PROJECT SYSTEM FOR PRIVATE CLOUD COMPUTING

December P Xerox App Studio 3.0 Information Assurance Disclosure

Raima Database Manager Version 14.0 In-memory Database Engine

How To Write A Trusted Analytics Platform (Tap)

VMware Server 2.0 Essentials. Virtualization Deployment and Management

VMware vcloud Automation Center 6.1

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Availability Digest. Redundant Load Balancing for High Availability July 2013

Rights Management Services

Transcription:

Potential of Virtualization Technology for Long-term Data Preservation J Blomer on behalf of the CernVM Team jblomer@cern.ch CERN PH-SFT 1 / 12

Introduction Potential of Virtualization Technology Preserve the historic data analysis environment (This capability is only a part of long-term data preservation) 2 / 12

Introduction Potential of Virtualization Technology Preserve the historic data analysis environment (This capability is only a part of long-term data preservation) Motivation: 1 Process legacy data Data formats are typically self-describing or convertable Software implicitly encodes knowledge about the correct interpretation of the data After substantial upgrades and modifications of the detector, the new software might lose this legacy knowledge 2 Validation of new software versions Otherwise, if the new software can process legacy data, comparison with historic version provides input for validation 2 / 12

Data Processing Environment for LHC Experiments CernVM Goal: a uniform and portable environment for developing and running LHC data processing applications 3 / 12

Data Processing Environment for LHC Experiments N e πin2 N n=1 Data Packets (Events) 3 / 12

Data Processing Environment for LHC Experiments N e πin2 N n=1 Data Packets (Events) 3 / 12

Data Processing Environment for LHC Experiments > cmsrun DiPhoton_Analysis_cfg.py N e πin2 N n=1 Data Packets (Events) 3 / 12

Data Processing Environment for LHC Experiments > cmsrun DiPhoton_Analysis_cfg.py N e πin2 N n=1 Data Packets (Events) Individual Analysis Code 0.1 MLOC Experiment Software Framework 4 MLOC High Energy Physics Libraries 5 MLOC Compiler System Libraries OS Kernel 20 MLOC 3 / 12

Data Processing Environment for LHC Experiments > cmsrun DiPhoton_Analysis_cfg.py N e πin2 N n=1 Data Packets (Events) Individual Analysis Code Experiment Software Framework High Energy Physics Libraries Compiler System Libraries OS Kernel 0.1 MLOC 4 MLOC 5 MLOC 20 MLOC stable changing 3 / 12

Data Processing Environment for LHC Experiments > cmsrun DiPhoton_Analysis_cfg.py N e πin2 N n=1 Data Packets (Events) Individual Analysis Code Experiment Software Framework High Energy Physics Libraries Compiler System Libraries OS Kernel 0.1 MLOC 4 MLOC 5 MLOC 20 MLOC stable changing Amplifying Frequent Updates Not a single binary a development environment Hundreds of libraries with partially untracked dependencies Not easily chunkable Not easily packagable 3 / 12

CernVM Blueprint CernVM Blueprint for Preserving the Data Processing Environment XMPP HTTP (Amazon EC2) CernVM Online OS + Extras Scratch HD ISO Image AUFS R/W Overlay initrd: CernVM-FS + μcontextualization AUFS Fuse Kernel Provide the analysis environment including the operating system on CernVM-FS. No packaging, the environment is provided on demand CernVM-FS is a snapshotting and versioning file system Only 2 CernVM-FS version strings describe the data analysis environment 4 / 12

CernVM-FS Operating System Repository Maintenance of the repository must not become a Linux distributor s job Idea: Automatically generate a fully versioned, closed package list from an unversioned shopping list of packages (Standard package managers are not designed for preservation!) Scientific Linux SL Cern CentOS!""# $%&"'# (# Dependency Closure Formulate dependencies as Integer Linear Program (ILP) Mirror 5 / 12

Versioning and Snapshots in CernVM-FS /cvmfs/atlas.cern.ch software 17.0.0 ChangeLog. Compression, SHA-1 806fbb67373e9... Repository Chunk store File catalogs Data Store Compressed chunks (files) Eliminates duplicates Never deletes File Catalog Directory structure, symlinks Content hashes of regular files Digitally signed integrity, authenticity Plain files, stored as chunks in the data store The root hash (40 characters) defines a file system snapshot (similar to git) Intrinsic and explicit versioning required 6 / 12

Versioning and Snapshots in CernVM-FS 7 / 12

Contextualization Simple API Instantiate + Contextualize Terminate List instances, list images Amiconfig plug-ins Credentials (ssh, X.509) Condor head & batch services Squid server XrootD storage proxy Monitoring & directory service agents F-"(49' F'()'( F-"(49' <(">0!'($DE 34-+, M"(C'( ;88< <(">0!'($DE!'($DE ;0='()*&"( 678 O4-':40 PQQ$ P 34-+, E4&-'( PQQ$!"?4+ D*(- 7++?$. Network configuration & tuning EFF FHM 8!<HI< 7<I 8 / 12

CernVM Online: Context Bookkeeping & Pairing 9 / 12

CernVM Online: Context Bookkeeping & Pairing 9 / 12

CernVM Online: Context Bookkeeping & Pairing 9 / 12

CernVM Online: Context Bookkeeping & Pairing 9 / 12

CernVM Online: Context Bookkeeping & Pairing 9 / 12

Components of the CernVM Blue Print Long-term preservation of analysis software environment! Virtual Machine! Linux distribution based on Scientific Linux.! Supports all popular hypervisors.! Minimal footprint, the VM interface is needed! Flexible contextualization.! CernVM Filesystem! Read-only, globally distributed file system optimized for software distribution.! Based on plain files and HTTP! Snapshotting and versioning file system! Already used in production by LHC experiments.! CernVM-FS environment is " defined version strings. OS " packages are defined by a versioned," closed package group (Meta-RPM)! You need only the CernVM version string to rebuild CernVM image on demand.! CernVM - based! data analysis environment preservation! Ensembles of CernVMs " can recreate a virtual cluster for data processing.! CernVM can be contextualized using a small subset of EC2 API that allows it to be deployed on public or private clouds! Bookkeeping! Private Cloud! More info : A practical approach to virtualization in HEP: http://dx.doi.org/10.1140/epjp/i2011-11013-1 CernVM Homepage : http://cernvm.cern.ch May 2012! 10 / 12

NA61 Production Jobs in Belgrade Integration of a CernVM cloud with a data provenance system 11 / 12

Summary Plain virtualization is not sufficient to preserve the data processing environment CernVM and CernVM-FS technologies provide handles to re-create a data processing environment identified by a few version strings As such, it is easy to integrate the definition of the data processing environment in data provenance systems The exposed interface is very slim: CernVM clusters run on private and public clouds without grid infrastructures Such virtual machines are easy to use and they can be given to interested citizens (see also LHC@Home 2.0 volunteer cloud) Space Distribution of the Runtime Environment Time Please contact us for any further questions! 12 / 12