The Evolution of Cloud Computing in ATLAS

Similar documents
Dynamic Resource Provisioning with HTCondor in the Cloud

The Evolution of Cloud Computing in ATLAS

Shoal: IaaS Cloud Cache Publisher

PoS(EGICF12-EMITC2)005

ATLAS job monitoring in the Dashboard Framework

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

Status and Evolution of ATLAS Workload Management System PanDA

Clusters in the Cloud

Evolution of Database Replication Technologies for WLCG

CernVM Online and Cloud Gateway a uniform interface for CernVM contextualization and deployment

Virtualization, Grid, Cloud: Integration Paths for Scientific Computing

Using S3 cloud storage with ROOT and CernVMFS. Maria Arsuaga-Rios Seppo Heikkila Dirk Duellmann Rene Meusel Jakob Blomer Ben Couturier

How To Run A Cloud Server On A Server Farm (Cloud)

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

Performance Management for Cloudbased STC 2012

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

An Introduction to Private Cloud

Running real jobs in virtual machines. Andrew McNab University of Manchester

The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)"

RED HAT INFRASTRUCTURE AS A SERVICE OVERVIEW AND ROADMAP. Andrew Cathrow Red Hat, Inc. Wednesday, June 12, 2013

Monitoring Elastic Cloud Services

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

HTCondor at the RAL Tier-1

Automating Big Data Benchmarking for Different Architectures with ALOJA

Solution for private cloud computing

World-wide online monitoring interface of the ATLAS experiment

Resource control in ATLAS distributed data management: Rucio Accounting and Quotas

Computing at the HL-LHC

Distributed Database Access in the LHC Computing Grid with CORAL

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

How To Create A Cloud Based System For Aaas (Networking)

Simplifying Big Data Deployments in Cloud Environments with Mellanox Interconnects and QualiSystems Orchestration Solutions

Introduction to OpenStack

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

Deploying complex applications to Google Cloud. Olia Kerzhner

Amazon EC2 XenApp Scalability Analysis

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Data Centers and Cloud Computing

Building a big IaaS cloud with Apache CloudStack

Tools and strategies to monitor the ATLAS online computing farm

CS 6343: CLOUD COMPUTING Term Project

Building a Volunteer Cloud

Software Testing and Deployment Using Virtualization and Cloud

Potential of Virtualization Technology for Long-term Data Preservation

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Configuration Management Evolution at CERN. Gavin

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Cloud Computing Architecture: A Survey

Testing Packet Switched Network Performance of Mobile Wireless Networks IxChariot

Mobile Cloud Computing T Open Source IaaS

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

Emerging Technology for the Next Decade

Monitoring Evolution WLCG collaboration workshop 7 July Pablo Saiz IT/SDC

Cloud Infrastructure Management - IBM VMControl

Using WebSphere Application Server on Amazon EC2. Speaker(s): Ed McCabe, Arthur Meloy

OpenStack Ecosystem and Xen Cloud Platform

Scaling Analysis Services in the Cloud

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Enabling Technologies for Distributed and Cloud Computing

<Insert Picture Here> Private Cloud with Fusion Middleware

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

Transcription:

The Evolution of Cloud Computing in ATLAS Ryan Taylor on behalf of the ATLAS collaboration CHEP 2015 Evolution of Cloud Computing in ATLAS 1

Outline Cloud Usage and IaaS Resource Management Software Services to facilitate cloud use Sim@P1 Performance Studies Operational Integration Monitoring, Accounting CHEP 2015 Evolution of Cloud Computing in ATLAS 2

The Clouds of ATLAS Helix Nebula Amazon Google ATLAS cloud jobs (Jan. 2014 present) 61% Single-core production 33% Multi-core production 3% User analysis CERN CA BNL UK AU CHEP 2015 Evolution of Cloud Computing in ATLAS 3

IaaS Resource Management HTCondor+Cloud Scheduler, VAC/VCycle, APF See talk 131 HEP cloud production using the CloudScheduler/HTCondor Architecture (C210, Tue. PM) Dynamic Condor slots to handle arbitrary job requirements e.g. single-core, multi-core, high-mem ucernvm image Contextualization using cloud-init Using Glint Image Management System see poster 304 CHEP 2015 Evolution of Cloud Computing in ATLAS 4

Shoal Proxy Cache Federator Build a fabric of proxy caches configurationless topology robust scalable Needed to run ucernvm at scale By default, DIRECT connection to closest Stratum 0/1 Contextualize instances to find proxy using Shoal [ucernvm-begin] CVMFS_PAC_URLS=http://shoal.heprc.uvic.ca/wpad.dat CVMFS_HTTP_PROXY=auto [ucernvm-end] Also use Shoal for Frontier access Currently under investigation CHEP 2015 Evolution of Cloud Computing in ATLAS 5

Sim@P1 Sim@P1 Resource contribution similar to T1 34M CPU hours, 1.1B MC events Used for LHC stops > 24h Jan,1 2014 present Fast automated switching via web GUI for shifters TDAQ to Sim@P1: 20m (check Nova DB, start VMs) Sim@P1 to TDAQ: 12m (graceful VM shutdown, update DB) Emergency switch to TDAQ: 100s (immediate termination) See poster 169 CHEP 2015 Evolution of Cloud Computing in ATLAS 6

HS06 Benchmarking Study Commercial clouds provide on-demand scalability e.g. urgent need for beyond pledged resources But how cost-effective are they? Comparison to institutional clouds CHEP 2015 Evolution of Cloud Computing in ATLAS 7

CHEP 2015 Evolution of Cloud Computing in ATLAS 8

CHEP 2015 Evolution of Cloud Computing in ATLAS 9

T2 & Remote Cloud Performance Comparison Used Hammercloud stress tests (24 hour stream) Data and squid cache at grid site Remote access for cloud site like zero-storage processing site Success rate similar HC 20052434 MC12 AtlasG4_trf 17.2.2.2 ~2000 cores Intel E5-2650 v2 ~200 cores Intel Core i7 9xx (Nehalem Class Core i7) CHEP 2015 Evolution of Cloud Computing in ATLAS 10

Software setup time Relies on CVMFS cache and Squid proxy VMs have to fill up empty cache (15 ± 7) s (45 ± 15) s Data stage-in time Local vs. remote storage access (11 ± 4) s (54 ± 20) s CHEP 2015 Evolution of Cloud Computing in ATLAS 11

Total running time 1.5x longer on cloud different CPUs hyperthreading? data & software access time not significant (4.7 ± 0.5) h (7.1 ± 1.0) h CPU efficiency equal! Cloud usage is efficient for this workload 98% 97% No significant performance penalty CHEP 2015 Evolution of Cloud Computing in ATLAS 12

Cloud Monitoring VM management becomes the responsibility of the VO Basic monitoring is required Detect and restart problematic VMs Identify dark resources (deployed but unusable) Can identify inconsistencies in other systems through cross-checks Common framework for all VOs Implemented with Ganglia http://agm.cern.ch CHEP 2015 Evolution of Cloud Computing in ATLAS 13

Cloud Accounting Provider-side: commercial invoice for resources delivered Consumer-side: record resources consumed Need to cross-check invoice against recorded usage! http://cloud-acc-dev.cern.ch/monitoring/atlas CHEP 2015 Evolution of Cloud Computing in ATLAS 14

Conclusion Increasing use of clouds in ATLAS Distributed Computing Performance characterization of commercial clouds More integration into operational model accounting, monitoring, support Developing and deploying services to facilitate cloud use CHEP 2015 Evolution of Cloud Computing in ATLAS 15

Extra Material CHEP 2015 Evolution of Cloud Computing in ATLAS 16

Dynamic Federation UGR Lightweight, scalable, stateless General-purpose, standard protocols and components Could be adopted by multiple experiments e.g. DataBridge, LHCb demo: http://federation.desy.de/fed/lhcb/ Metadata plugin used to emulate Rucio directory structure No site action needed to join HTTP endpoints extracted from AGIS with script CHEP 2015 Evolution of Cloud Computing in ATLAS 17

CHEP 2015 Evolution of Cloud Computing in ATLAS 18

http://shoal.heprc.uvic.ca PAC Interface JSON REST Interface github.com/hep-gc/shoal CHEP 2013 Poster CHEP 2015 Evolution of Cloud Computing in ATLAS 19

http://cern.ch/go/d8qj CHEP 2015 Evolution of Cloud Computing in ATLAS 20

CHEP 2015 Evolution of Cloud Computing in ATLAS 21

http://cern.ch/go/hb9m CHEP 2015 Evolution of Cloud Computing in ATLAS 22