Interoperating Cloud-based Virtual Farms

Similar documents
Managing a Tier-2 Computer Centre with a Private Cloud Infrastructure

Test of cloud federation in CHAIN-REDS project

HTCondor at the RAL Tier-1

IaaS Federation. Contrail project. IaaS Federation! Objectives and Challenges! & SLA management in Federations 5/23/11

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

Monitoring Elastic Cloud Services

CernVM Online and Cloud Gateway a uniform interface for CernVM contextualization and deployment

Relational Databases in the Cloud

HPC Cloud Computing with OpenNebula

CLEVER: a CLoud-Enabled Virtual EnviRonment

Sistemi Operativi e Reti. Cloud Computing

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

OpenNebula Leading Innovation in Cloud Computing Management

The Grid-it: the Italian Grid Production infrastructure

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Cloud Federations in Contrail

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Introducing ScienceCloud

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

Virtualization, Grid, Cloud: Integration Paths for Scientific Computing

2) Xen Hypervisor 3) UEC

Benchmarking Amazon s EC2 Cloud Platform

Elastic Management of Cluster based Services in the Cloud

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

PoS(EGICF12-EMITC2)005

Getting Started Hacking on OpenNebula

MagFS: The Ideal File System for the Cloud

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.0

Cloud Computing Architecture: A Survey

A Cost-Evaluation of MapReduce Applications in the Cloud

A Middleware Strategy to Survive Compute Peak Loads in Cloud

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

System Models for Distributed and Cloud Computing

<Insert Picture Here> Enterprise Cloud Computing: What, Why and How

Open Cirrus: Towards an Open Source Cloud Stack

Design and Building of IaaS Clouds

Cloud Computing. Adam Barker

JISC. Technical Review of Using Cloud for Research. Guidance Notes to Cloud Infrastructure Service Providers. Introduction

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Bulletin. Introduction. Dates and Venue. History. Important Dates. Registration

Implementation and Usage Aspects of a Private JEE Cloud

OpenNebula The Open Source Solution for Data Center Virtualization

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

A Survey on Cloud Storage Systems

The Evolution of Cloud Computing in ATLAS

The OpenNebula Cloud Platform for Data Center Virtualization

Big Data and Cloud Computing for GHRSST

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Infrastructure as a Service (IaaS)

Running Oracle Databases in a z Systems Cloud environment

CDFII Computing Status

Li Sheng. Nowadays, with the booming development of network-based computing, more and more

Automating Big Data Benchmarking for Different Architectures with ALOJA

Cloud Computing through Virtualization and HPC technologies

Microsoft Cloud Platform System. powered by Dell

Infrastructure-as-a-Service Cloud Computing for Science

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

Cloud Computing and Software Agents: Towards Cloud Intelligent Services

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

Challenges in Hybrid and Federated Cloud Computing

Cloud Courses Description

OpenNebula Open Souce Solution for DC Virtualization

Introduction to OpenStack

PARALLELS CLOUD SERVER

U-LITE Network Infrastructure

OGF25/EGEE User Forum Catania, Italy 2 March 2009

CompatibleOne Open Source Cloud Broker Architecture Overview

HTCondor within the European Grid & in the Cloud

Solution for private cloud computing

Nimbus: Cloud Computing with Science

Cloud Optimize Your IT

Cluster, Grid, Cloud Concepts

Transcription:

Stefano Bagnasco, Domenico Elia, Grazia Luparello, Stefano Piano, Sara Vallero, Massimo Venaruzzo For the STOA-LHC Project Interoperating Cloud-based Virtual Farms

The STOA-LHC project 1 Improve the robustness and usability of the existing LHC Italian infrastructure Funded as an Italian PRIN (research Project of Relevant National Interest) (See the summary poster in Poster Session B) Common effort to ease data and resource access for the LHC Community This talk focuses on the ALICE-related activity: Parallel and interactive analysis solutions (the Virtual Analysis Facility) Standard access to interactive resources in different local deployments (e.g. centralised authentication system) Federation among single analysis facilities to optimise distribution and access to remote data Interoperating Cloud-based Virtual Farms - 2

The STOA-LHC project 2 Improve the robustness and usability of the existing LHC Italian infrastructure Funded as an Italian PRIN (research Project of Relevant National Interest) (See the summary poster in Poster Session B) Build a uniform environment for last mile of analysis: Use familiar interfaces Exploit existing tools Benefit from Cloud Computing technologies locally (isolate applications, elasticity) Use high-level tools for federation (no Cloud federation or bursting) Extend the model to allow users outside high-energy physics to re-use tools and exploit computing infrastructures Interoperating Cloud-based Virtual Farms - 3

the infrastructure Trieste: Test deployment OpenStack 24 cores, 1.2 TB 3 Gbps WAN Torino: Production Cloud OpenNebula 1.3k cores, 1.6 PB 10 Gbps WAN Padova-Legnaro: Test deployment OpenStack 100 cores, 5 TB 10 Gbps WAN Coming soon: Catania and Cagliari Bari: PRISMA testbed OpenStack 600 cores, 110 TB 10 Gbps WAN Interoperating Cloud-based Virtual Farms - 4

the strategy Don t write new tools! Use existing tools and features Exploit good GARR networking between sites Explore Cloud Computing technologies Workload management The Virtual Analysis Facility Presented at CHEP2013 (see next slide) Based on PROOF for interactive analysis Data access Use xrootd s available federation tools Interoperating Cloud-based Virtual Farms - 5

key component: the VAF The Virtual Analysis Facility PROOF+PoD CernVM HTCondor elastiq What is the VAF? A cluster of CernVM virtual machines: one head node, many workers Running the HTCondor job scheduler Capable of growing and shrinking based on the usage with elastiq Configured via a web interface: cernvm-online.cern.ch Entire cluster launched with a single command User interacts only by submitting jobs Elastic Cluster as a Service: elasticity is embedded, no external tools PoD and dynamic workers: run PROOF on top of it as a special case Dario.Berzano@cern.ch - A grounds-up approach to High-Throughput Cloud Computing in High-Energy Physics 26 Dario Berzano s talk @ CHEP2013 Interoperating Cloud-based Virtual Farms - 6

key component: the vaf Dario Berzano s talk @ CHEP2013 Interoperating Cloud-based Virtual Farms - 7

ongoing activity summary Activities: Benchmarking activities at all sites Common analysis task and data-set Tests on local data storage access (Trieste) Application monitoring with the ElasticSearch ecosystem (Torino, Padova) See Sara Vallero s talk on Monday Production use at the Torino site: in operation since November 2013 60 TB of dedicated storage (GlusterFS, Xrootd) up to ~100 workers mainly analysis on ntuples (TSelector) Data federation (Bari and all sites) Check the poster in Poster Session A Interoperating Cloud-based Virtual Farms - 8

Workers deploy time If new VMs need to be instantiated, workers deploy time ranges from 2.5 min to 3.5 min If VMs are already available, deploy time ranges from 16s to 3 min The golden number of 30 workers (see later) is reached in 2.5 min in the first case and 25 s in the latter Optimal number of workers Interoperating Cloud-based Virtual Farms - 9

Wall-time for different analysis steps QAMultistrange: event selection re-vertexing QAMultistrange analysis Simple pt spectrum analysis Data sample: LHC10h (PbPb) run 139510 226k events Interoperating Cloud-based Virtual Farms - 10

Wall-time for different analysis steps QAMultistrange: event selection re-vertexing Results: For this type of analysis and number of events, 30 workers is the optimal number Wall-time is comparable for low and high CPU-intensive analyses QAMultistrange analysis Simple pt spectrum analysis Data sample: LHC10h (PbPb) run 139510 226k events Interoperating Cloud-based Virtual Farms - 11

the storage federation blueprint Interoperating Cloud-based Virtual Farms - 12

the storage federation blueprint Work in progress Bari Meta-manager deployed Ongoing tests on a subset of sites Interoperating Cloud-based Virtual Farms - 13

Distributed Storage and Data Federation Distribute and share data using a unique XRootD Italian redirector Results: This is an ongoing task! Difference within 10-20% at most, even for " Two steps of a test analysis: I/O intensive jobs 1. 75% I/O intensive and 25% CPU intensive 2. Encouraging 17% I/O intensive to further and 83% develop CPU intensive the VAF data federation using such XRootD option Plot the ratio between wall time of jobs accessing files via XROOTD- IT Still and to locally investigate: scalability, stability 1: I/O intensive analysis 2: CPU intensive analysis Interoperating Cloud-based Virtual Farms - 14

VAF monitoring with the ELK stack ELK stack HTTP MySQL DB Also accounting " INFN Grid services Dedicated DB tables TProofMon SenderSQL VAF See Sara Vallero s talk on Monday Collect monitoring and accounting data from both IaaS and application Investigation of the ELK stack to handle heterogeneous and unstructured data sources Possible solution for Monitoring-as-a- Service providing uniform extendable monitoring platform to applications Interoperating Cloud-based Virtual Farms - 15

provisional conclusions and outlook The VAF model works well and can be easily adapted to different use cases Just need to package an end-to-end toolkit suited to different communities E.g. without PROOF or PoD or other specific tools This needs to include a working accounting system The ELK stack can be used to build a flexible system: to provide accounting information and Monitoring-as-a-Service for applications The Data Federation model also is feasible Small performance penalty balanced by flexibility and deduplication Scalability and stability still under investigation Interoperating Cloud-based Virtual Farms - 16

thanks! The present work is supported by the Istituto Nazionale di Fisica Nucleare (INFN) of Italy and is partially funded under contract 20108T4XTM of Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (PRIN, Italy). Interoperating Cloud-based Virtual Farms - 17